У нас вы можете посмотреть бесплатно From Prompts to Policies: How RL Builds Better AI Agents [Mahesh Sathiamoorthy] - 731 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA. 🗒️ For the full list of resources for this episode, visit the show notes page: https://twimlai.com/go/731. 🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confi... 🗣️ CONNECT WITH US! =============================== Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/ Follow us on Twitter: / twimlai Follow us on LinkedIn: / twimlai Join our Slack Community: https://twimlai.com/community/ Subscribe to our newsletter: https://twimlai.com/newsletter/ Want to get in touch? Send us a message: https://twimlai.com/contact/ 📖 CHAPTERS =============================== 00:00 - Introduction 3:54 - Importance of data 7:50 - RL as a tool in data curation 10:21 - Curator 12:34 - Contemporary applications of reinforcement learning (RL) 22:33 - Improving models with RL fine-tuning 24:05 - Improving Multi-Turn Tool Use with RL 26:04 - Advantages of RL 31:06 - Reward shaping 33:50 - Findings in applying RL to tool use 35:42 - Examples of applying RL in tool use 40:57 - Compute of RL vs. SFT 43:25 - Future of democritizing agentic tools 46:20 - Evaluation of results 49:45 - Difference of multi-turn from single-turn tool use 52:46 - MiniChart and MiniCheck 57:32 - Bespoke Labs 58:57 - Future directions 🔗 LINKS & RESOURCES =============================== Improving Multi-Turn Tool Use with Reinforcement Learning - https://www.bespokelabs.ai/blog/impro... Bespoke Curator - https://github.com/bespokelabsai/cura... Bespoke-Minicheck - https://www.bespokelabs.ai/bespoke-mi... MiniChart Playground - https://playground.bespokelabs.ai/min... 📸 Camera: https://amzn.to/3TQ3zsg 🎙️Microphone: https://amzn.to/3t5zXeV 🚦Lights: https://amzn.to/3TQlX49 🎛️ Audio Interface: https://amzn.to/3TVFAIq 🎚️ Stream Deck: https://amzn.to/3zzm7F5