У нас вы можете посмотреть бесплатно 57x FASTER? How DeepSeek Just REWROTE the Transformer Forever! или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In January 2025, the Chinese company DeepSeek shocked the AI world with the release of R1, a model that requires only a fraction of the compute used by leading American counterparts. But the real breakthrough isn't just the weights—it’s the architecture. In this video, we break down Multi-head Latent Attention (MLA), the innovation that strikes at the core of the Transformer architecture. We explore how DeepSeek managed to shrink the KV cache—a critical computational bottleneck—by a staggering factor of 57. This allows the model to generate text more than six times faster than traditional Transformers while actually improving algorithmic performance. Key topics covered: • The mechanics of the Standard Attention mechanism. • Why the KV cache usually causes memory usage to explode in Large Language Models (LLMs). • How DeepSeek uses a latent space to compress keys and values efficiently. • A comparison of Multi-Query Attention, Grouped Query Attention, and DeepSeek’s superior MLA. • The "absorbed weights" trick that reduces compute during inference. DeepSeek has carved a new path in AI history, proving that clever linear algebra can unlock levels of intelligence and efficiency we previously thought impossible.