У нас вы можете посмотреть бесплатно Controlling AI That Wants To Take Over – So We Can Use It Anyway | Buck Shlegeris или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Most AI safety conversations centre on alignment: ensuring AI systems share our values and goals. But despite progress, we’re unlikely to know we’ve solved the problem before the arrival of human-level and superhuman systems in as little as three years. So some are developing a backup plan to safely deploy models we fear are actively scheming to harm us — so-called “AI control.” While this may sound mad, given the reluctance of AI companies to delay deploying anything they train, not developing such techniques is probably even crazier. Today’s guest — Buck Shlegeris, CEO of Redwood Research — has spent the last few years developing control mechanisms, and for human-level systems they’re more plausible than you might think. He argues that given companies’ unwillingness to incur large costs for security, accepting the possibility of misalignment and designing robust safeguards might be one of our best remaining options. Check out the full transcript with links to learn more on the 80,000 Hours website: https://80k.info/buck Chapters: • Cold open (00:00:00) • Who’s Buck Shlegeris? (00:01:25) • What’s AI control? (00:01:51) • Why is AI control hot now? (00:05:46) • Detecting human vs AI spies (00:10:44) • Acute vs chronic AI betrayal (00:15:41) • How to catch AIs trying to escape (00:18:10) • The cheapest AI control techniques (00:33:18) • Can we get untrusted models to do trusted work? (00:39:33) • If we catch a model escaping... will we do anything? (00:51:01) • Getting AI models to think they've already escaped (00:53:39) • Will they be able to tell it's a setup? (00:59:01) • Will AI companies do any of this stuff? (01:01:05) • Can we just give AIs fewer permissions? (01:07:16) • Can we stop human spies the same way? (01:11:05) • The pitch to AI companies to do this (01:16:13) • Will AIs get superhuman so fast that this is all useless? (01:18:29) • Risks from AI deliberately doing a bad job (01:19:50) • Is alignment still useful? (01:26:05) • Current alignment methods don't detect scheming (01:30:39) • How to tell if AI control will work (01:33:08) • How can listeners contribute? (01:37:28) • Is 'controlling' AIs kind of a dick move? (01:38:51) • Could 10 safety-focused people in an AGI company do anything useful? (01:44:12) • Benefits of working outside frontier AI companies (01:49:40) • Why Redwood Research does what it does (01:53:38) • What other safety-related research looks best to Buck? (02:01:07) • If an AI escapes, is it likely to be able to beat humanity from there? (02:02:02) • Will misaligned models have to go rogue ASAP, before they're ready? (02:09:22) • Is research on human scheming relevant to AI? (02:10:24) This episode was originally recorded on February 21, 2025. Video: Simon Monsour and Luke Monsour Audio engineering: Ben Cordell, Milo McGuire, and Dominic Armstrong Transcriptions and web: Katy Moore