📌 More Fruitful SFT by Respecting the Learner's Distribution | Dylan Zhang - скачать видео с ютуба бесплатно по ссылке

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: More Fruitful SFT by Respecting the Learner's Distribution | Dylan Zhang в качестве 4k

У нас вы можете посмотреть бесплатно More Fruitful SFT by Respecting the Learner's Distribution | Dylan Zhang или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон More Fruitful SFT by Respecting the Learner's Distribution | Dylan Zhang в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

More Fruitful SFT by Respecting the Learner's Distribution | Dylan Zhang

Speaker: Dylan Zhang (UIUC) Abstract: Classic supervised fine-tuning (SFT) often ignores the learner’s own distribution, treating supervision as universally valid even when it differs from what the model would naturally produce. This mismatch can lead to inefficiencies and unexpected behavior during LLM post-training. In this talk, Dylan Zhang presents two methods built on the idea that supervision should respect the learner’s distribution. GRAPE improves SFT through model-aware data selection, choosing responses that are most likely under the target model. PEAR addresses the mismatch between offline SFT and online RL by reweighting training loss based on how likely the model is to generate each response. Together, these approaches show that simple, policy-aware adjustments to SFT can significantly improve post-training performance. Bio: Dylan Zhang is a Ph.D. student at the University of Illinois Urbana-Champaign (UIUC), advised by Prof. Hao Peng. His research focuses on LLM post-training, model alignment, and understanding how large language models learn and generalize.

Comments