У нас вы можете посмотреть бесплатно The Most Important Graph in AI Right Now | Beth Barnes, CEO of METR или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
AI models today have a 50% chance of successfully completing a task that would take an expert human one hour. 7 months ago, that number was roughly 30 minutes — and 7 months before that, 15 minutes. These are substantial, multi-step tasks requiring sustained focus: building web applications, conducting machine learning research, or solving complex programming challenges. Today’s guest, Beth Barnes, is CEO of METR (Model Evaluation & Threat Research) — the leading organisation measuring these capabilities. (Graph is discussed at 00:31:56 and visible here: https://metr.org/blog/2025-03-19-meas...) Beth's team has been timing how long it takes skilled humans to complete projects of varying length, then seeing how AI models perform on the same work. The resulting paper 'Measuring AI ability to complete long tasks', made waves by revealing that the planning horizon of AI models was doubling roughly every 7 months. It's regarded by many as the most useful AI forecasting work in years. The companies building these systems aren’t just aware of this trend — they want to harness it as much as possible, and are aggressively pursuing automation of their own research. That’s both an exciting and troubling development, because it could radically speed up advances in AI capabilities, accomplishing what would have taken years or decades in just months. That itself could be highly destabilising, as we explored in a previous episode: • Why the 'intelligence explosion' might be ... And having AI models rapidly build their successors with limited human oversight naturally raises the risk that things will go off the rails if the models at the end of the process lack the goals and constraints we hoped for. Beth has found models can already do “meaningful work” on improving themselves, and she wouldn’t be surprised if AI models were able to autonomously self-improve as little as 2 years from now — in fact, she says, “It seems hard to rule out even shorter [timelines]. Is there 1% chance of this happening in six, nine months? Yeah, that seems pretty plausible.” Beth adds: "The sense I really want to dispel is, 'But the experts must be on top of this. The experts would be telling us if it really was time to freak out.' The experts are not on top of this. Inasmuch as there are experts, they are saying that this is a concerning risk. … And to the extent that I am an expert, I am an expert telling you you should freak out." Links, highlights, transcript: https://80k.info/bb What did you think? https://forms.gle/sFuDkoznxBcHPVmX6 Chapters: • Cold open (00:00:00) • Who’s Beth Barnes? (00:01:17) • Can we see AI scheming in the chain of thought? (00:01:51) • The chain of thought is essential for safety checking (00:09:16) • Alignment faking in large language models (00:12:50) • We have to test model honesty even before they're used inside AI companies (00:17:33) • We have to test models when unruly & unconstrained (00:27:02) • Each 7 months models can do tasks twice as long (00:31:56) • METR's research finds AIs are solid at AI research already (00:51:31) • AI may turn out to be strong at novel & creative research (00:58:18) • When can we expect an algorithmic 'intelligence explosion'? (01:01:44) • Recursively self-improving AI might even be here in 2 years — which is alarming (01:07:55) • Could evaluations backfire by increasing AI hype & racing? (01:14:29) • Governments first ignore new risks, but can overreact once they arrive (01:30:52) • Do we need external auditors doing AI safety tests, not just the companies themselves? (01:39:55) • A case against safety-focused people working at frontier AI companies (01:54:09) • The new, more dire situation has forced changes to METR's strategy (02:08:40) • AI companies are being locally reasonable, but globally reckless (02:16:55) • Overrated: Interpretability research (02:21:49) • Underrated: Developing more narrow AIs (02:23:44) • Underrated: Helping humans judge confusing model outputs (02:30:28) • Overrated: Major AI companies’ contributions to safety research (02:32:55) • Could we have a science of translating AI models' nonhuman language or neuralese? (02:36:45) • Could we ban using AI to enhance AI, or is that just naive? (02:39:15) • Open-weighting models is often good & Beth has changed her attitude to it (02:45:31) • What we can learn about AGI from the nuclear arms race (02:50:22) • Infosec is so bad that no models are truly closed-weight models (03:05:53) • AI is more like bioweapons because it undermines the leading power (03:10:43) • What METR can do best that others can't (03:21:12) • What METR isn't doing that other people have to step up and do (03:36:51) • Research METR plans to do next (03:42:09) This episode was originally recorded on February 17, 2025. Video editing: Luke Monsour and Simon Monsour Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong Music: Ben Cordell Transcriptions and web: Katy Moore