У нас вы можете посмотреть бесплатно STEM: Scaling Transformers with Embedding Modules или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
🚀 Discover the Future of Transformers with STEM! https://www.emergent-behaviors.com/st... In this video, we explore the innovative approach of "STEM: Scaling Transformers with Embedding Modules." This amazing research from Carnegie Mellon University and Meta AI presents a solution to the inefficiencies of traditional Transformer architectures. Learn how STEM leverages embedding modules to enhance model performance while minimizing computational costs. We'll delve into the architecture changes that make STEM a game-changer, including its ability to stabilize training and improve accuracy per FLOP. By the end of this video, you'll understand the intricacies of this new paradigm and its potential impact on the field of AI. 📌 What You'll Learn: • 🧠 How STEM replaces expensive matrix multiplications with efficient lookup mechanisms • 📉 The reasons behind the instability of Mixture of Experts (MoE) and how STEM mitigates them • 📊 The significance of training return on investment (ROI) in model performance • 🔍 Insights into interpretability and knowledge editing in embedding layers • ⚖️ The advantages of long-context scaling for better retrieval and efficiency ⏳ Timestamps: 0:00 Introduction to STEM: Scaling Transformers with Embedding Modules 0:42 Why MoE Hurts in Practice: Instability, Bandwidth, Complexity 1:27 The Epiphany: FFNs as Key-Value Memory and Tokens as Addresses 2:14 Architecture Swap: Replace the Up-Projection with a Token Embedding Table 3:06 System Trick: CPU Offloading, Prefetching, and Token Deduplication 4:06 Validation: Perplexity Curves Without the 'Heart Attack' 4:44 Training ROI (ROT): More Accuracy Per FLOP 5:33 Angular Spread: STEM Embeddings Reduce Interference 6:17 Interpretability and Knowledge Editing: Finding Where 'Spain' Lives 7:16 Token Length Mismatch: Editing Across 1-Token vs 2-Token Words 8:05 Test-Time Capacity Scaling: Longer Context Activates More Embeddings 8:53 The Graveyard of Failed Approaches: What Not to Replace 9:54 Scoreboard: Where STEM Helps Most 10:37 Final Tally: Smarts, Interpretability, Long-Context Scaling 11:39 Bottom Line and References: Stability, Efficiency, and Where to Read More STEM: SCALING TRANSFORMERS WITH EMBEDDING MODULES https://arxiv.org/pdf/2601.10639 Ranajoy Sadhukhan, Carnegie Mellon University Sheng Cao, Carnegie Mellon University Harry Dong, Carnegie Mellon University Changsheng Zhao, Carnegie Mellon University Attiano Purpura-Pontoniere, Meta AI Yuandong Tian, Meta AI Zechun Liu, Meta AI Beidi Chen, Meta AI #AI #Transformers #MachineLearning #STEM #EmbeddingModules #Research #DeepLearning #NLP #ArtificialIntelligence #CMU #MetaAI #TechInnovation #KnowledgeEditing #Architecture #ModelPerformance