У нас вы можете посмотреть бесплатно Training Agentic Reasoners — Will Brown, Prime Intellect или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
This talk will be a technical deep dive into RL for agentic reasoning via multi-turn tool calling, similar to OpenAI's o3 and Deep Research. In particular, we'll cover: When, why, and how GRPO vs PPO vs etc Designing environments and rewards Survey of recent research highlights Results on example tasks Overview of open-source ecosystem (libraries, compute requirements, tradeoffs, etc.) About Will Brown Will Brown is a Research Engineering Lead at Prime Intellect, focusing on RL for reasoning and agents. He previously held research roles at Morgan Stanley and AWS, and completed his PhD in Computer Science at Columbia University. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter Timestamps [00:00] Introduction to the idea that reasoning and agents are similar. [01:05] The growing effectiveness of Reinforcement Learning (RL) in AI. [03:04] The complexities and challenges of implementing RL. [04:41] The connection between popular AI products (agents) and RL fine-tuning. [07:18] The core process of Reinforcement Learning. [10:21] The importance of tools and real-world tasks for agents. [12:13] The problem of "reward hacking" and how to design better evaluations. [14:51] Future directions for agentic systems and a practical toolkit for implementation.