📌 Routing Stateful AI Workloads in Kubernetes - Maroon Ayoub, IBM & Michey Mehta, Red Hat - скачать видео с ютуба бесплатно по ссылке

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: Routing Stateful AI Workloads in Kubernetes - Maroon Ayoub, IBM & Michey Mehta, Red Hat в качестве 4k

У нас вы можете посмотреть бесплатно Routing Stateful AI Workloads in Kubernetes - Maroon Ayoub, IBM & Michey Mehta, Red Hat или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон Routing Stateful AI Workloads in Kubernetes - Maroon Ayoub, IBM & Michey Mehta, Red Hat в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

Routing Stateful AI Workloads in Kubernetes - Maroon Ayoub, IBM & Michey Mehta, Red Hat

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands (23-26 March, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io Routing Stateful AI Workloads in Kubernetes - Maroon Ayoub, IBM & Michey Mehta, Red Hat Kubernetes excels at stateless service routing - but modern AI workloads are not stateless. Generative workloads demand context-aware routing that maximizes performance while reducing costs. This talk explores layered routing strategies for stateful LLM workloads on Kubernetes - from round-robin to full KV-Cache-aware load balancing. We’ll explain when each level applies, and its effects on performance. Based on our experience developing llm-d - a framework using the K8s Gateway API Inference Extension, a collaboration between Google, IBM Research, and RedHat - we’ll cover: Why traditional Kubernetes routing falls short for generative AI Routing patterns for long-context, sessionful traffic Global cache indices and local offloading for smart routing Benchmarks showing latency, cache hit rates, and GPU utilization Practical ways to adopt cache-aware routing without major infra changes If you’re scaling multi-turn, agentic, or LLM-powered workloads, this session is for you.

Comments