У нас вы можете посмотреть бесплатно Neel Nanda: Mechanistic Intepretability (HAAISS 2024) или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Neel Nanda presents a comprehensive overview of mechanistic interpretability in AI, focusing on techniques to understand the internal algorithms of neural networks. He explains how the field aims to reverse engineer neural networks by identifying interpretable features and circuits, using tools like sparse autoencoders and activation patching. Key insights include the hypothesis that neural networks contain human-comprehensible algorithms that can be understood through rigorous analysis. Nanda discusses superposition in neural networks, where models compress multiple features into shared dimensions, and highlights sparse autoencoders as a promising technique for decomposing model representations. He emphasizes the field's relevance to AI safety, particularly in distinguishing genuinely aligned behavior from deceptive alignment. The talk covers both theoretical foundations and practical applications, including recent work on frontier models like GPT-4 and Claude, while acknowledging current limitations and open research directions.