У нас вы можете посмотреть бесплатно CMU/Tsinghua/Zhejiang/UC Berkeley: Maximum Likelihood Reinforcement Learning или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
🚀 Unlocking the Future of Reinforcement Learning with MaxRL! https://www.emergent-behaviors.com/ma... In this video, we explore the innovative research paper "Maximum Likelihood Reinforcement Learning" by Fahim Tajwar, Guanning Zeng, and their colleagues from Carnegie Mellon University and other esteemed institutions. Discover how MaxRL addresses the limitations of traditional reinforcement learning by maximizing the true likelihood of outcomes, rather than just optimizing for rewards. We delve into the key differences between reinforcement learning and maximum likelihood methods, uncovering why focusing on low-probability failures can lead to more robust learning. Join us as we unpack the mechanics of MaxRL, its computational advantages, and how it significantly enhances inference efficiency in various applications, including navigation and problem-solving. 📌 What You'll Learn: • The critical distinction between reinforcement learning and maximum likelihood optimization • How MaxRL changes the learning dynamics to focus on harder prompts • The efficiency gains in training and inference with MaxRL • Empirical results showing MaxRL's superiority on mathematical benchmarks ⏳ Timestamps: 0:00 Introduction to Maximum Likelihood Reinforcement Learning 0:41 The Non-Differentiable Zone - Why RL Shows Up 1:24 RL Gradient vs ML Gradient - The 1/p(x) Punchline 2:30 RL as a First-Order Approximation - Bringing Back the Missing Terms 3:15 Compute-Indexed Family - Caveman Mode to Galaxy Brain 3:57 Algorithm 1 - The Denominator Change that Changes the Objective 4:47 The Weighting Function - Why Easy Problems Get Too Much Love 5:28 Controlled Experiment - MaxRL Matches Cross-Entropy Behavior 6:04 Infinite Data Mazes - Negative Log Pass@k Improves with Rollouts 6:44 Data-Scarce GSM8K - Avoiding Collapse at Pass@128 7:25 Math Benchmarks - Pareto Dominance on Qwen Models 8:07 Inference Efficiency - Spend Compute in Training, Save at Test Time 8:46 Gradient Norm vs Pass Rate - Learning from Failure, Not Mediocrity 9:22 Practical Drop-in Guidance - On-Policy, Compatible, Fix Normalization 10:03 Wrap-Up - Why MaxRL Exists and What It Buys You MAXIMUM LIKELIHOOD REINFORCEMENT LEARNING https://arxiv.org/pdf/2602.02710 Fahim Tajwar, Carnegie Mellon University, ftajwar@andrew.cmu.edu Guanning Zeng, Tsinghua University Yueer Zhou, Zhejiang University Yuda Song, Carnegie Mellon University Daman Arora, Carnegie Mellon University Yiding Jiang, Carnegie Mellon University Jeff Schneider, Carnegie Mellon University Ruslan Salakhutdinov, Carnegie Mellon University Haiwen Feng, UC Berkeley Andrea Zanette, Carnegie Mellon University, azanette@andrew.cmu.edu #ReinforcementLearning #MaxRL #MachineLearning #AIResearch #CarnegieMellon #DeepLearning #Optimization #DataScience #Algorithm #Math #InferenceEfficiency #ArtificialIntelligence #TechInnovation #Research #ComputerScience