У нас вы можете посмотреть бесплатно Composition-RL: Enhancing LLM Reasoning via Sequential Prompt Composition или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
A methodology called Composition-RL is presented to solve the problem of data efficiency degradation that appears in the reinforcement learning process of the giant language model (LLM). In order to overcome the phenomenon that the model solves existing problems so easily that it no longer gets learning signals as learning progresses, the authors automatically combine existing verifiable problems to create new complex problems with high difficulty. This sequential prompt configuration (SPC) method provides the model with complex reasoning capabilities and implicit solution process supervision, leading to better performance than simple data mixing. Experiments in various model sizes from 4B to 30B have proven to consistently improve reasoning skills in many areas, including mathematics and physics. In particular, it shows that the curriculum learning strategy, which gradually increases the complexity of the problem, can effectively expand the performance limits of the model even within limited data. https://arxiv.org/pdf/2602.12036