У нас вы можете посмотреть бесплатно Early stages of the reinforcement learning era of language models или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Hey friends! This is a recent talk I gave at the UC Santa Cruz Silicon Valley Extension to their Natural Language Processing (NLP) masters students, doctoral students, alumni, and friends. In this talk I cover the recent trend of reinforcement finetuning of language models, how it came about, technically how it is done, early experiments using it at Ai2 and recent mainstream releases utilizing it (DeepSeek R1, Claude 3.7, Grok 3, etc.). I conclude with a future of extensive RL training rather than just finetuning. You can find the slides here: https://docs.google.com/presentation/... Or, the full recording with talks from Alessio of Latent Space and Dylan of SemiAnalysis here: • Frontiers of AI: Language, Inference, and ... Very related to a recent talk I gave on my primary Interconnects channel: • An Unexpected Reinforcement Learning Renai... Thanks Sam & Jeff for hosting me! The next talk I post will include some more hot off the press RL research than this one :D