Скачать с ютуб видео A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3) в качестве 4k

У нас вы можете посмотреть бесплатно A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3) или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3) в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

A coding tutorial on how to reverse-engineer a model trained to grok modular addition! I'm joined by Jess Smith in this replication of our paper, Progress Measures for Grokking via Mechanistic Interpretability. In this part, we train the model to perform modular addition, and see that it groks! Code: https://neelnanda.io/modular-addition... Part 2: https://neelnanda.io/modular-addition... Part 3: https://neelnanda.io/modular-addition... The paper: https://neelnanda.io/grokking Getting started in mechanistic interpretability: https://neelnanda.io/getting-started TransformerLens: https://github.com/neelnanda-io/Trans... Transformer tutorial: https://neelnanda.io/transformer-tuto... Original grokking paper: https://arxiv.org/abs/2201.02177 OUTLINE: 0:00 - Intro 0:52 - What even is grokking? 5:09 - Define the tasks 7:23 - Training data fraction rationale 9:46 - Define the model 14:41 - Define optimizer and loss function 17:51 - Training the model 19:30 - Discussion on model size and interpretability 23:46 - What even is mechanistic interpretability? 27:09 - Interlude on the slingshot mechanism 32:55 - The results and conclusion

Comments