У нас вы можете посмотреть бесплатно An Interview with Microsoft's Saurabh Dighe About Maia 200 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Maia 100 was a pre-GPT accelerator. Maia 200 is explicitly post-GPT for large multimodal inference. Saurabh Dighe says if Microsoft were chasing peak performance or trying to span training and inference, Maia would look very different. Higher TDPs. Different tradeoffs. Those paths were pruned early to optimize for one thing: inference price-performance. That focus drives the claim of ~30% better performance per dollar versus the latest hardware in Microsoft’s fleet. Topics covered: • What “30% better price-performance” actually means • Who Maia 200 is built for • Why Microsoft bet on inference when designing Maia back in 2022/2023 • Large SRAM + high-capacity HBM • Massive scale-up, no scale-out • On-die NIC integration Maia is a portfolio platform: many internal customers, varied inference profiles, one goal. Lower inference cost at planetary scale. Chapters: (00:00) Introduction (01:00) What Maia 200 is and who it’s for (02:45) Why custom silicon isn’t just a margin play (04:45) Inference as an efficient frontier (06:15) Portfolio thinking and heterogeneous infrastructure (09:00) Designing for LLMs and reasoning models (10:45) Why Maia avoids training workloads (12:00) Betting on inference in 2022–2023, before reasoning models (14:40) Hyperscaler advantage in custom silicon (16:00) Capacity allocation and internal customers (17:45) How third-party customers access Maia (18:30) Software, compilers, and time-to-value (22:30) Measuring success and the Maia 300 roadmap (28:30) What “30% better price-performance” actually means (32:00) Scale-up vs scale-out architecture (35:00) Ethernet and custom transport choices (37:30) On-die NIC integration (40:30) Memory hierarchy: SRAM, HBM, and locality (49:00) Long context and KV cache strategy (51:30) Wrap-up