Скачать с ютуб видео Building a GPT-2 Model from Scratch by Stefan Schminanski

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: Building a GPT-2 Model from Scratch by Stefan Schminanski в качестве 4k

У нас вы можете посмотреть бесплатно Building a GPT-2 Model from Scratch by Stefan Schminanski или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон Building a GPT-2 Model from Scratch by Stefan Schminanski в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

Building a GPT-2 Model from Scratch by Stefan Schminanski

Speaker: Stefan Schminanski, Principal Engineer at NVIDIA Slides: TBD Join Cloud Native Community Heidelberg at https://community.cncf.io/cloud-native-hei... Abstract: Everybody is talking about AI and LLMs — attention, transformers, tokens, embeddings, context windows, system prompts, temperature, backpropagation, PyTorch, KV caching, vLLM, llm-d, pre-training, post-training, reinforcement learning. So many terms, and it's easy to get imposter syndrome in a conversation like that. Last Christmas holidays, I decided to change that. Inspired by Andrej Karpathy's NanoChat project — and following his recommendation to not just run his code but to write everything from the ground up myself — I set out to build a GPT-2 class model. Think of it as "GPT-2 the Hard Way," in the spirit of Kelsey Hightower's Kubernetes the Hard Way. I started by studying the "Attention Is All You Need" paper, watching many hours of Karpathy's YouTube lectures, and gradually building an intuition for how everything fits together. Then I wrote my own PyTorch implementation of the GPT-2 architecture and put a data pipeline in front of it using freely available datasets from Hugging Face. As a twist, my model would be trained exclusively on German data: German internet archive datasets, Wikipedia, Goethe, and transcripts from German YouTube channels. My bar: only use components I understand well enough to confidently explain them to others — which is what I'll attempt in this talk.

Comments