У нас вы можете посмотреть бесплатно Distillation of Transformer Models или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
➡️ Get Life-time Access to the Complete Scripts (and future improvements): https://Trelis.com/ADVANCED-fine-tuning ➡️ One-click fine-tuning and LLM templates: https://github.com/TrelisResearch/one... ➡️ Newsletter: https://blog.Trelis.com ➡️ Resources/Support/Discord: https://Trelis.com/About ➡️ Thumbnail made with this tutorial: • Fine Tune Flux Diffusion Models with ... With credit to Rohan Sharma for work on these scripts on a Trelis Internship: https://trelis.com/internships/. Find Rohan on GitHub: https://github.com/rs545837/ Thanks also to Elie Bakouch of HuggingFace for guidance on using SmolLM corpus: https://huggingface.co/eliebak VIDEO RESOURCES: Slides: https://docs.google.com/presentation/... Minitron Distillation Paper: https://d1qx31qr3h6wln.cloudfront.net... Distil-Whisper Paper: https://arxiv.org/pdf/2311.00430 SmolLM Corpus: https://huggingface.co/datasets/Huggi... Trelis SmolLM 2% split: https://huggingface.co/datasets/Treli... WebInstruct: https://huggingface.co/datasets/TIGER... TIMESTAMPS: 0:00 AI model distillation (Whisper, Flux, Minitron, gpt-4o-mini?) 0:46 Video Overview - Distillation Tutorial and Code Walk-through 2:00 Distillation Examples (Diffusion - Flux Schnell / Dev, Transcription - Distil-Whisper, LLMs - Nvidia Minitron) 6:51 How distillation works 7:22 Student model initialization 8:36 Layer / depth pruning 11:52 Width pruning 15:25 Pre-training versus distillation 18:40 Cross-entropy loss vs KL-divergence 22:41 Instruction fine-tuning 23:28 Distilling SmolLM 135M to a 99M model 24:43 Code walk-through setup. 26:49 Pruning Notebook 28:56 Layer Pruning 31:41 Width Pruning 35:01 Why pruning works? 36:17 Distillation Script - Multi-GPU Setup 39:36 Distillation Script Walk-through 54:05 Distillation Configuration File Walk-through 56:32 Distillation Startup and Performance Monitoring with tensorboard 1:03:01 Instruction fine-tuning and dataset selection 1:09:02 Instruction FT Startup and Performance Monitoring with tensorboard 1:12:40 Running inference to evaluate distillation performance 1:12:54 Teacher model performance (base SmolLM 135M) 1:13:53 SmolLM Instruct model performance 1:14:15 Raw pruned model performance (layer pruned) 99M 1:14:38 Width + Layer pruning performance (raw) 99M 1:15:18 Distilled model performance (before instruction tuning) 99M 1:15:57 Instruction tuning performance evaluation 1:16:21 SmolLM 135M Instruct performance 1:17:17 Instruction tuned distilled model performance (99M model) 1:18:33 Final Tips (best pruning approach, learning rate, batch size and model size effects) 1:20:21 Video Resources