У нас вы можете посмотреть бесплатно Effortless Inference, Fine-Tuning, and RAG using Kubernetes Operators или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Deploying large OSS LLMs in public/private cloud infrastructure is a complex task. Users inevitably face challenges such as managing huge model files, provisioning GPU resources, configuring model runtime engines, and handling troublesome Day 2 operations like model upgrades or performance tuning. In this talk, we will present Kaito, an open-source Kubernetes AI toolchain operator, which simplifies these workflows by containerizing the LLM inference service as a cloud-native application. With Kaito, model files are included in container images for better version control; new CRDs and operators streamline the process of GPU provisioning and workload lifecycle management; and “preset” configurations ease the effort of configuring the model runtime engine. Kaito also supports model customizations such as LoRA fine-tuning and RAG for prompt crafting. Overall, Kaito enables users to manage self-owned OSS LLMs in Kubernetes easily and efficiently, whether in the cloud or on-premises Kubernetes clusters. --- Speaker: Ishaan Sehgal ---