У нас вы можете посмотреть бесплатно Eliciting Secret Knowledge from Language Models by Bartosz Cywiński или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Welcome to AI Safety Poland Talks! A biweekly series where researchers, professionals, and enthusiasts from Poland or connected to the Polish AI community share their work on AI Safety. Topic: Eliciting Secret Knowledge from Language Models Speaker: Bartosz Cywiński Language: English Date: 08.01.2026, 18:00 Bio Bartosz is a PhD student working on mechanistic interpretability at the Warsaw University of Technology. He's also a MATS 8.0 scholar working with Arthur Conmy, currently mostly interested in research on applied interpretability and model organisms. Abstract We want to know what AIs know, even if they don't tell us. This talk will cover a study of uncovering secret knowledge from language models. To study this, we build a suite of secret-keeping LLMs where we train them to possess some secret knowledge that they can use but deny having when asked directly. On this benchmark, we evaluate how well different black-box and white-box methods based on mechanistic interpretability tools can uncover this secret knowledge.