У нас вы можете посмотреть бесплатно More Nodes, More Problems: Solving Multi-Host GPU/TPU Scheduli... John Belamaric & Morten Torkildsen или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Hong Kong, China (June 10-11); Tokyo, Japan (June 16-17); Hyderabad, India (August 6-7); Atlanta, US (November 10-13). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io More Nodes, More Problems: Solving Multi-Host GPU/TPU Scheduling With Dynamic Resource Allocation - John Belamaric & Morten Torkildsen, Google Big training jobs and muti-host inference need a lot of nodes and accelerators. More nodes and accelerators mean more chances for failures. How can we be sure to have enough working GPUs for our job? How can we utilize the healthy portions of a 16x16 TPU cluster if one node fails? Simple node labels won’t cut it. DRA is beta in Kubernetes 1.32. Usually, it’s used for managing individual devices on a node. But did you know that DRA supports modeling resources that are accessible across many nodes? This powerful abstraction can model clusters of nodes and devices. Combining it with the alpha partitionable device model in 1.33, we can correctly model complex multi-host, multi-accelerator topologies, and schedule workloads to them as a unit! This is a real game changer for AI/ML workloads on K8s. Come learn about these current and upcoming technologies, and how the K8s community is applying them to massive compute clusters like the NVIDIA GB200 and ultra powerful multi-host TPU slices.