У нас вы можете посмотреть бесплатно Inference Time Scaling for Enterprises | No Math AI или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this episode of "No Math AI," Akash and Isha visit the Red Hat Summit to connect with Red Hat CEO Matt Hicks and CTO Chris Wright, discussing the practical necessities of bringing inference time scaling (also referred to as test time scaling/compute) to enterprise users worldwide. Matt Hicks explores the pivotal role of an AI platform in abstracting complexity and absorbing costs as AI shifts from static models to dynamic, agentic applications. These applications heavily rely on inference time scaling techniques, such as reasoning and particle filtering, which generate numerous tokens to achieve greater accuracy. Hicks emphasizes the need for platforms to lower the unit price of these capabilities, enable enterprises to easily adopt such techniques, and instill confidence by providing cost transparency to overcome the "fear response" associated with unpredictable expenses when performing more inferencing. Chris Wright outlines the open-source AI roadmap for reliably deploying these new, inference-heavy technologies in production. He discusses the challenges of moving beyond single-instance inference to a distributed infrastructure capable of accommodating concurrent users and efficiently handling the massive token generation required by these scaled inference processes. Wright introduces LLM-d, a new Red Hat project focused on creating a standard for distributed inference platforms. LLM-d aims to optimize hardware utilization, manage distributed KV caches, and intelligently route requests based on hardware requirements, integrating with Kubernetes. The goal is to build repeatable blueprints for a common architecture to handle these inference-time-scaling workloads through collaborative open-source efforts. Together, Hicks and Wright highlight that effectively scaling the underlying inference infrastructure from single-server instances to a robust, distributed, and transparent platform is a critical bottleneck. Addressing this bottleneck through community efforts is essential for the future of enterprise AI and the widespread adoption of inference time scaling. RSS feed: https://feeds.simplecast.com/c1PFREqr Spotify: https://open.spotify.com/show/7Cpcy42... For more episodes No Math AI subscribe to: @redhat