У нас вы можете посмотреть бесплатно No More GPU Cold Starts: Making Serverless ML Inference Truly Real-Time - Nikunj Goyal & Aditi Gupta или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands (23-26 March, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io No More GPU Cold Starts: Making Serverless ML Inference Truly Real-Time - Nikunj Goyal, Adobe & Aditi Gupta, Disney Plus Hotstar Serverless ML inference is great but when GPUs are involved, cold starts can turn milliseconds into minutes. Whether scaling transformer models or using custom inference services, the startup latency caused by container initialization, GPU driver loading, and heavyweight model deserialization can kill real-time performance and cost you tons of money. In this talk, we'll break down the anatomy of GPU cold starts in modern ML serving stacks including why GPUs introduce unique cold-path delays, how CRI and device plugins contribute to it, and what really happens when a PyTorch model boot-up on a fresh pod. We’ll walk through production-ready strategies to reduce startup latency: Pre-warmed GPU pod pools to bypass init time Model snapshotting with TorchScript or ONNX to speed up deserialization Lazy loading techniques that delay model initialization until the first request Thus helping you eliminate cold start pain and keep your services fast, efficient, and production-ready.