У нас вы можете посмотреть бесплатно What Happened With Sparse Autoencoders? или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on the saga of sparse autoencoders as a mechanistic interpretability technique, what they can and can't do, and the highs and lows that resulted. 00:00:00 Early Dictionary Learning 00:03:04 The Grand Claim (And Its Flaws) 00:06:51 Problems with Sparsity 00:10:01 Polysemanticity 00:18:05 Limits of Max Activating Dataset Examples 00:23:22 The Hype and Community Dynamics 00:25:24 Critiquing the Metrics 00:31:21 Two Goals for Interpretability 00:34:13 Why SAEs Can't Prove Safety 00:38:17 Scaling to Real Models 00:42:39 The Goal: Reverse Engineering vs. Tooling 00:44:51 Failure on Supervised Tasks 00:49:00 The Power of Unsupervised Discovery 00:52:37 The Othello Anomaly 00:55:37 A Linear Representation in Disguise 01:01:00 Finding Novel Concepts 01:06:05 Case Study: Golden Gate Claude 01:09:00 Probing for Hallucinations 01:13:25 Making a Better Dataset 01:16:20 From Discovery to Supervised Tools 01:19:00 Downstream Tasks for Understanding 01:22:00 Generating Hypotheses about Data 01:28:13 Finding Surprising Correlations 01:31:45 Dataset Diffing 01:34:22 Pathologies: When Sparsity Fails 01:37:00 Composition and Feature Splitting 01:39:53 A Solution: Matryoshka SAEs 01:45:24 Better Metrics, Worse Performance 01:47:38 Lessons from the Saga 01:49:01 The Field Today 01:50:47 SAEs as One Tool Among Many 01:53:34 LLMs for Chain-of-Thought Analysis 01:59:22 Precision vs. Discovery 02:01:24 Transcoder Attribution Graphs 02:04:32 Case Study: Implicit Planning 02:08:13 Does Fine-Grained Detail Matter? 02:15:50 The Problem with Error Compounding 02:19:21 The Role of Error Nodes 02:22:45 Why Error Nodes Obscure Analysis 02:26:00 Final Takeaways 02:30:46 Where Dictionary Learning is Now 02:34:34 The Challenge of Reasoning Models 02:36:54 Q&A: Low-Error Transcoders 02:40:09 Q&A: Error Nodes and Overfitting 02:42:20 Q&A: End-to-End SAEs 02:46:03 Q&A: Baselines for Hypothesis Generation