У нас вы можете посмотреть бесплатно Arthur Douillard - Distributed Training in Machine Learning или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
This session is part of the Cohere Labs Open Science Community Summer School, a learning initiative featuring some of the leading minds in machine learning from INRIA, META (FAIR), Google DeepMind, Cohere Labs and more. Learn more about all upcoming speakers in this event series. This talk provides an overview of the landscape of distributed deep learning for LLMs. Due to their sheer size, LLM training must be distributed across multiple GPUs. We’ll first cover the methods sharding the computation across colocated GPUs (such as Fully Sharded Data Parallelism (FSDP), Pipeline and Expert Parallelism (PP & EP). Then, we’ll explore more exploratory methods such as DiLoCo, SWARM, PowerSGD & DeMo, and others which often come with a ML cost but could enable training on GPUs spread across the world. This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Ahmad Anis, Lead of our Geo Regional Asia group for their dedication in organizing this event. If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker. Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommuni....