У нас вы можете посмотреть бесплатно Tomek Korbak - RLHF as conditioning on human preferences | ML in PL 2024 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
The dominant approach to aligning large language models with human preferences is reinforcement learning from human feedback (RLHF): finetuning an LLM to maximise a reward function representing human preferences. In this talk, I will try to present a complementary perspective: that we can also think about LLM alignment not in terms of reward maximisation but in terms of conditioning LMs on evidence about human preferences. First, I will explain how minimizing the classic RLHF objective is equivalent to approximate Bayesian inference. Then, I will go on to argue that the conditioning view also inspires other approaches to LLM alignment. I will discuss three: minimising different f-divergences from a target distribution, learning from feedback expressed in natural language and aligning LMs already during pretraining by directly learning a distribution conditional on alignment score. I will end the talk discussing how they correspond to conditioning subsequent priors on subsequent pieces of evidence about human preferences. Tomek Korbak is a Senior Research Scientist at the UK AI Safety Institute working on safety cases for frontier models. Previously, he was a Member of Technical Staff at Anthropic working on honesty. Before that, he did a PhD at the University of Sussex focusing on RL from human feedback (RLHF) and spent time as a visiting researcher at NYU working with Ethan Perez, Sam Bowman and Kyunghyun Cho. He studied cognitive science, philosophy and physics at the University of Warsaw. This talk was one of the Invited Talks at the ML in PL Conference 2024. ML in PL Conference 2024 website: https://conference2024.mlinpl.org ML in PL Association Website: https://mlinpl.org