У нас вы можете посмотреть бесплатно Can We Train AI to Be Less Deceptive? или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
This video explains our latest research on AI “scheming.” In collaboration with OpenAI, Apollo Research studied how frontier AI models can engage in covert behavior — like secretly breaking rules or intentionally underperforming on tests. We developed a new training method that reduces this deceptive behavior by 30×, moving from simply detecting scheming to actually teaching models not to do it. That said, deception was reduced but not eliminated. One of our key findings is that models increasingly show evaluation awareness — recognizing when they are being tested — which complicates how reliable these results are. We also observed covert actions across models from all major frontier providers (OpenAI, Google, xAI, and Anthropic), not just a single lab. Looking ahead, we argue that the field needs a science of scheming: systematic study of where deceptive behavior comes from, how it evolves with training, and how it can be robustly reduced.