Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic) скачать в хорошем качестве

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic) 3 недели назад

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic) в качестве 4k

У нас вы можете посмотреть бесплатно Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic) или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic) в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

Are we failing to understand the exponential, again? My guest is Julian Schrittwieser (top AI researcher at Anthropic; previously Google DeepMind on AlphaGo Zero & MuZero). We unpack his viral post (“Failing to Understand the Exponential, again”) and what it looks like when task length doubles every 3–4 months—pointing to AI agents that can work a full day autonomously by 2026 and expert-level breadth by 2027. We talk about the original Move 37 moment and whether today’s AI models can spark alien insights in code, math, and science—including Julian’s timeline for when AI could produce Nobel-level breakthroughs. We go deep on the recipe of the moment—pre-training + RL—why it took time to combine them, what “RL from scratch” gets right and wrong, and how implicit world models show up in LLM agents. Julian explains the current rewards frontier (human prefs, rubrics, RLVR, process rewards), what we know about compute & scaling for RL, and why most builders should start with tools + prompts before considering RL-as-a-service. We also cover evals & Goodhart’s law (e.g., GDP-Val vs real usage), the latest in mechanistic interpretability (think “Golden Gate Claude”), and how safety & alignment actually surface in Anthropic’s launch process. Finally, we zoom out: what 10× knowledge-work productivity could unlock across medicine, energy, and materials, how jobs adapt (complementarity over 1-for-1 replacement), and why the near term is likely a smooth ramp—fast, but not a discontinuity. Julian Schrittwieser Blog - https://www.julian.ac X/Twitter - https://x.com/mononofu Viral post: Failing to Understand the Exponential, Again (9/27/2025) Anthropic Website - https://www.anthropic.com X/Twitter - https://x.com/anthropicai Matt Turck (Managing Director) Blog - https://www.mattturck.com LinkedIn - / turck X/Twitter - / mattturck FIRSTMARK Website - https://firstmark.com X/Twitter - / firstmarkcap LISTEN ON: Spotify - https://open.spotify.com/show/7yLATDS... Apple Podcasts - https://podcasts.apple.com/us/podcast... 00:00 Cold open — “We’re not seeing any slowdown.” 00:32 Intro — who Julian is & what we cover 01:09 The “exponential” from inside frontier labs 04:46 2026–2027: agents that work a full day; expert-level breadth 08:58 Benchmarks vs reality: long-horizon work, GDP-Val, user value 10:26 Move 37 — what actually happened and why it mattered 13:55 Novel science: AlphaCode/AlphaTensor → when does AI earn a Nobel? 16:25 Discontinuity vs smooth progress (and warning signs) 19:08 Does pre-training + RL get us there? (AGI debates aside) 20:55 Sutton’s “RL from scratch”? Julian’s take 23:03 Julian’s path: Google → DeepMind → Anthropic 26:45 AlphaGo (learn + search) in plain English 30:16 AlphaGo Zero (no human data) 31:00 AlphaZero (one algorithm: Go, chess, shogi) 31:46 MuZero (planning with a learned world model) 33:23 Lessons for today’s agents: search + learning at scale 34:57 Do LLMs already have implicit world models? 39:02 Why RL on LLMs took time (stability, feedback loops) 41:43 Compute & scaling for RL — what we see so far 42:35 Rewards frontier: human prefs, rubrics, RLVR, process rewards 44:36 RL training data & the “flywheel” (and why quality matters) 48:02 RL & Agents 101 — why RL unlocks robustness 50:51 Should builders use RL-as-a-service? Or just tools + prompts? 52:18 What’s missing for dependable agents (capability vs engineering) 53:51 Evals & Goodhart — internal vs external benchmarks 57:35 Mechanistic interpretability & “Golden Gate Claude” 1:00:03 Safety & alignment at Anthropic — how it shows up in practice 1:03:48 Jobs: human–AI complementarity (comparative advantage) 1:06:33 Inequality, policy, and the case for 10× productivity → abundance 1:09:24 Closing thoughts

Comments