У нас вы можете посмотреть бесплатно More Fruitful SFT by Respecting the Learner's Distribution | Dylan Zhang или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Speaker: Dylan Zhang (UIUC) Abstract: Classic supervised fine-tuning (SFT) often ignores the learner’s own distribution, treating supervision as universally valid even when it differs from what the model would naturally produce. This mismatch can lead to inefficiencies and unexpected behavior during LLM post-training. In this talk, Dylan Zhang presents two methods built on the idea that supervision should respect the learner’s distribution. GRAPE improves SFT through model-aware data selection, choosing responses that are most likely under the target model. PEAR addresses the mismatch between offline SFT and online RL by reweighting training loss based on how likely the model is to generate each response. Together, these approaches show that simple, policy-aware adjustments to SFT can significantly improve post-training performance. Bio: Dylan Zhang is a Ph.D. student at the University of Illinois Urbana-Champaign (UIUC), advised by Prof. Hao Peng. His research focuses on LLM post-training, model alignment, and understanding how large language models learn and generalize.