У нас вы можете посмотреть бесплатно How to Get Started with Distributed Training at Scale | Ray Summit 2025 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Slides: https://drive.google.com/file/d/1jmA5... At Ray Summit 2025, Suman Debnath and Linda Haviv from Anyscale share how to master distributed training strategies essential for efficiently scaling today’s deep learning models. They begin by breaking down the core techniques—data parallelism, model parallelism, and pipeline parallelism—and explain when each approach is most effective as models and datasets grow. The session covers advanced methods such as sharded training and ZeRO, along with the tradeoffs that arise in real-world large-cluster environments. Suman and Linda also address the toughest challenges in distributed training, including communication overhead, fault tolerance, reproducibility, and managing heterogeneous compute. They then demonstrate how PyTorch and Ray can be combined to implement these strategies with minimal code changes, making it easier to scale from prototype to production. What you’ll learn: How data, model, and pipeline parallelism work—and when to apply each How to overcome scalability bottlenecks such as communication overhead and system failures How to use Ray with PyTorch to launch, orchestrate, and monitor large-scale distributed training jobs Liked this video? Check out other Ray Summit breakout session recordings • Ray Summit 2025 - Breakout Sessions Subscribe to our YouTube channel to stay up-to-date on the future of AI! / anyscale 🔗 Connect with us: LinkedIn: / joinanyscale X: https://x.com/anyscalecompute Website: https://www.anyscale.com/