Скачать с ютуб видео [State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science,

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: [State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire в качестве 4k

У нас вы можете посмотреть бесплатно [State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон [State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

[State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire

From PhD research on grounding and language models to shipping interpretability tools in production at Goodfire, Jack Merullo and Mark Bissell are building the infrastructure to crack open the black box—making models not just powerful, but understandable, steerable, and safe. We caught up with them live at NeurIPS to dig into the state of mechanistic interpretability heading into 2026: why interpretability is no longer just a research curiosity but a practical tool for deployment in high-stakes industries (healthcare, finance, life sciences), how Goodfire's platform turns unsupervised feature discovery into real-world applications like paint.goodfire.ai (painting directly into Stable Diffusion's mental map) and PII detection for Rakuten (500x cheaper than GPT-5 as a judge, higher recall than LLM-based methods), why the memorization vs. reasoning spectrum matters more than binary "memorized or not" framing (factual recall sits somewhere between rote memorization and logical reasoning), how cross-layer transcoders and circuit tracing are scaling interpretability across every layer of a model (creating attribution graphs through interpretable features), why Neil Nanda's pivot to pragmatic interpretability isn't a retreat but a validation that interp is ready for real-world impact (managing by outcome, not just reverse-engineering from scratch), the Pasteur's Quadrant philosophy that drives Goodfire (bouncing between foundational research and applied use cases, like Pasteur pioneering germ theory and vaccines), and their thesis that interpretability unlocks latent capacity in models that you simply can't access by treating them as black boxes—whether that's finding novel biomarkers of disease in genomics models, scrubbing PII from customer chats, or giving creative tools direct access to a diffusion model's internal concept map. We discuss: Jack's path: PhD student (2020–2025) working on grounding in language models → full pivot to interpretability research → Goodfire Mark's path: Palantir healthcare engineer → Goodfire in March 2024, focused on applied interpretability and platform engineering What Goodfire does: AI research company building a platform for interpreting models across modalities and domains, focused on real-world deployment in high-stakes industries paint.goodfire.ai: using interpretability to paint directly into Stable Diffusion's mental map—unsupervised concept discovery (animals, backgrounds, scenes) lets you select and paint concepts on a 2D canvas, a totally new interface for a text-only model Fact editing and the ROME paper (rank-one model editing): updating facts ("capital of France is now Marseille") is an interpretability problem, not yet deployed at scale What's new in 2025: interpretability showing up in model cards, evals, red teaming exercises (Gemini, Claude), and real production deployments like Rakuten's PII detection AI for science: *narrowly superhuman models in genomics, medical imaging, proteomics, materials science*—interpretability unlocks novel biomarkers of disease and scientific discovery in domains humans can't natively understand (base pairs in, base pairs out) Cross-layer transcoders and circuit tracing: scaling SAEs across every layer, tying features across layers, and creating attribution graphs to trace how models produce outputs through interpretable primitives Why Neel Nanda's pivot to pragmatic interpretability isn't "interpretability is dead" but a validation that interp is ready for real-world impact—managing by outcome, not just reverse-engineering models from scratch Pasteur's Quadrant: bouncing between foundational research (Niels Bohr understanding the atom) and applied research (Edison inventing the light bulb), with Pasteur as the model for doing both (germ theory and vaccines) — Goodfire Team Goodfire: https://goodfire.ai Paint demo: https://paint.goodfire.ai Careers: https://goodfire.ai/careers 00:00:00 Introduction: GoodFire's Mission and the State of Mechanistic Interpretability 00:00:56 From Grounding to Interpretability: Jack's PhD Journey 00:03:04 Paint.GoodFire.ai: Interpretability for Creative Control 00:05:30 Disentangling Memorization from Reasoning: The Spectrum of Model Capabilities 00:06:40 Unlearning vs Suppression: The Challenge of Removing Information 00:08:15 Real-World Deployments: PII Detection and Enterprise Applications 00:10:20 AI for Science: Genomics, Proteomics, and Novel Biomarker Discovery 00:11:20 Circuit Tracing and Cross-Layer Transcoders: Anthropic's Contribution 00:15:44 The Neil Nanda Pivot: Pragmatic Interpretability and Managing by Outcome 00:18:37 Pasteur's Quadrant: Balancing Discovery and Application at GoodFire

Comments

[State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire скачать в хорошем качестве

скачать видео

скачать mp3

скачать mp4

поделиться

телефон с камерой

телефон с видео

бесплатно

загрузить,

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: [State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire в качестве 4k

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон [State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire в формате MP3:

[State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire