У нас вы можете посмотреть бесплатно Intelligent LLM Routing: A New Paradigm for Multi-Model AI Orchestration... Chen Wang & Huamin Chen или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands (23-26 March, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io Intelligent LLM Routing: A New Paradigm for Multi-Model AI Orchestration in Kubernetes - Chen Wang, IBM Research & Huamin Chen, Red Hat This research-driven talk introduces a novel architecture paradigm that complements recent advances in timely intelligent inference routing for large language models. By integrating proxy-based classification and reranking techniques, we've developed a system that efficiently routes incoming prompts to domain-specialized LLMs based on rapid content analysis. Our approach creates a meta-layer of intelligence above traditional model serving infrastructures, enabling specialized models to handle queries they're optimized for while maintaining a unified API interface. We'll present performance research comparing this distributed approach against monolithic inference-time scaling, demonstrating how intelligent routing can achieve superior results for complex, multi-domain workloads while reducing computational overhead. The session includes a Kubernetes-based reference implementation and quantitative analysis of throughput, latency, and accuracy across diverse prompt categories.