У нас вы можете посмотреть бесплатно Proximal Policy Optimization (PPO) & Group Relative Policy Optimization (GRPO) | Paper Explained или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this video we dive into Proximal Policy Optimization (PPO) and Group Relative Policy Optimization. Both are Reinforcement Learning methods and became very popular through their application in Large Language Models. They are used during post-training to align models to preference data. This preference data is often modeled through a return function. The video covers the entire math derivation of PPO and GRPO, starting with a super intuitive initial idea and slowly going through the different steps that are needed to arrive at the final objectives. Enjoy! 00:00 Introduction 01:17 Problem Statement 03:17 Intuitive Objective 04:07 Analytically Computable Objective 10:11 Return Function 12:07 Value Function 14:53 Importance Sampling 17:40 TRPO 19:16 PPO 21:15 GRPO 23:45 Summary 24:31 Outro Further Reading: 1. Log-derivative: https://andrewcharlesjones.github.io/... 2. RL Introduction (really good!): https://spinningup.openai.com/en/late... 3. Return Function: https://spinningup.openai.com/en/late... 4. Value Function https://spinningup.openai.com/en/late... 5. Importance Sampling: f • Importance Sampling 6. TRPO Explanation : / rl-trust-region-policy-optimization-trpo-e... 7. TRPO Paper https://arxiv.org/abs/1502.05477 8. PPO Paper: https://arxiv.org/abs/1707.06347 9. GRPO Paper: https://arxiv.org/abs/2402.03300 10. REINFORCE Paper: https://people.cs.umass.edu/~barto/co... #ppo #grpo #rlhf #reinforcementlearning #gpt