У нас вы можете посмотреть бесплатно DeepSeek’s Engram: Reinventing LLM Memory with the Sparsity Allocation Law или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this episode of SciPulse, we explore a major architectural shift in artificial intelligence developed by researchers from DeepSeek-AI and Peking University. While Mixture-of-Experts (MoE) has become the standard for scaling Large Language Models (LLMs), it lacks a native primitive for simple knowledge lookup, often forcing models to waste computational "depth" simulating retrieval. Enter Engram: a "conditional memory" module that modernises classic N-gram embeddings to create a new axis of sparsity for AI. By allowing for constant-time O(1) lookups of static information—such as named entities and formulaic patterns—Engram allows the model to allocate its neural computation where it matters most: complex reasoning. What You Will Learn in This Podcast: • The Sparsity Allocation Law: Discover the U-shaped scaling law uncovered by the DeepSeek team. The research proves that the best performance comes from a hybrid approach, reallocating roughly 20–25% of the sparse parameter budget from MoE experts to Engram memory. • Creating "Effective Depth": Learn how Engram relieves the model’s early layers from the burden of reconstructing static knowledge. Mechanistic analysis shows that this effectively "deepens" the network, reaching "prediction-ready" states much faster than standard models. • Benchmarking Success: We break down the performance of Engram-27B, which outperformed strictly iso-parameter MoE baselines in general reasoning (BBH +5.0), mathematics (MATH +2.4), and coding (HumanEval +3.0). • Long-Context Superiority: By delegating local dependencies to lookups, the model preserves more attention capacity for global context, boosting Multi-Query Needle-in-a-Haystack scores from 84.2 to 97.0. • Infrastructure-Aware Design: See how DeepSeek-AI bypassed GPU memory constraints by using deterministic addressing to prefetch massive parameter tables (up to 100B+) from host memory with less than 3% overhead. This episode is essential listening for AI researchers, computer science students, and anyone interested in how DeepSeek is pushing the boundaries of efficient, large-scale machine learning. Educational Disclaimer: This podcast provides a summary of the scientific findings for educational purposes. It is not a substitute for the original research paper. We encourage all viewers to consult the full text for specific methodologies and technical data. Video: • DeepSeek’s Engram: Building Conditional Me... Original Research Paper: https://www.arxiv.org/pdf/2601.07372 #DeepSeek #AI #MachineLearning #LLMs #Engram #ComputerScience #DeepLearning #NLP #Sparsity #Research #MoE #TechInnovation #SciPulse #ArtificialIntelligence #LargeLanguageModels