У нас вы можете посмотреть бесплатно 646-Steering and Monitoring AI Models или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Researchers have developed a scalable method called the Recursive Feature Machine (RFM) to identify and manipulate the internal knowledge of artificial intelligence models. By extracting linear concept representations, this approach allows for model steering, which can adjust model behavior toward specific semantic notions like languages, political stances, or coding proficiency. The study demonstrates that this technique improves AI safety and performance across various architectures, often surpassing the effectiveness of traditional prompting. Furthermore, these internal features prove highly efficient for monitoring hallucinations and toxic content, outperforming even advanced judge models like GPT-4o. Ultimately, the findings suggest that model capabilities can be significantly enhanced by directly engaging with their internal activation spaces rather than relying solely on external text interactions. References: • Beaglehole D, Radhakrishnan A, Boix-Adsera E, et al. Toward universal steering and monitoring of AI models[J]. Science, 2026, 391(6787): 787-792.