У нас вы можете посмотреть бесплатно Keynote: AI Data Center Networking: Lessons from Meta's Evolution или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Abstract Meta has iterated through multiple generations of networking architectures to support increasingly demanding machine learning workloads. This keynote outlines the journey through successive AI network deployments, examining how each iteration informed the next and sharing hard-won lessons from production operations. This keynote will provide a comprehensive view of the AI networking evolution through the lens of complementary technology layers: PyTorch and AI frameworks, xPU characteristics and their traffic patterns, NIC selection and capabilities, and the resulting network implications. The talk will explore how decisions at each layer cascade through the stack—how framework behavior influences hardware selection, how accelerator characteristics drive network topology choices, and how NIC capabilities enable or constrain operational approaches. The discussion will cover practical experiences with network automation at scale, infrastructure density challenges (power, cooling, space), telemetry approaches for AI workload visibility, and operational strategies for managing rapid technology transitions while maintaining production stability. Attendees will gain insight into the architectural decisions, false starts, and breakthrough solutions that emerged from deploying and operating multiple generations of AI clusters in production. Omar Baldonado: Omar Baldonado leads the groups that develop/operate Meta's global data center networks. These networks support all of Meta’s AI models and the Meta family of apps (Meta AI, Facebook, Instagram, WhatsApp, Messenger). These groups have developed some of the largest AI clusters in the world (with gigawatt-scale clusters on the way), and they continually share their work through open-source libraries (e.g., TorchComms for PyTorch, FBOSS for switches) and in communities like the Open Compute Project. Omar has been in networking since the early 1990s.