Скачать с ютуб видео Fast Finetuning of Gemma-3, Qwen-3 and GPT-OSS on Strix Halo using Unsloth and Multi-Node Setups

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: Fast Finetuning of Gemma-3, Qwen-3 and GPT-OSS on Strix Halo using Unsloth and Multi-Node Setups в качестве 4k

У нас вы можете посмотреть бесплатно Fast Finetuning of Gemma-3, Qwen-3 and GPT-OSS on Strix Halo using Unsloth and Multi-Node Setups или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон Fast Finetuning of Gemma-3, Qwen-3 and GPT-OSS on Strix Halo using Unsloth and Multi-Node Setups в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

Fast Finetuning of Gemma-3, Qwen-3 and GPT-OSS on Strix Halo using Unsloth and Multi-Node Setups

In this video, I introduce an updated Strix Halo fine-tuning toolbox to include two major improvements: Unsloth integration and multi-node distributed training. The setup builds on the previous fine-tuning tutorial, but now leverages Unsloth's highly optimized Triton kernels to reduce VRAM usage and speed up training times for models like Gemma 3. I cover the software details that make this possible: how Unsloth dynamically patches the Hugging Face Transformers library, and why standard PyTorch Autograd is less efficient for these specific architectures. I also show a side-by-side comparison of full fine-tuning and LoRA, demonstrating the massive memory and speed advantages Unsloth provides, along with the specific commits and patches required to get it running on ROCm. For the distributed training side, I walk through running DDP (Distributed Data Parallel) and FSDP (Fully-Sharded Data Parallel) across a 2-node Strix Halo cluster. Just like with vLLM, the main blocker here was missing RCCL support for gfx1151 in upstream ROCm. I explain how I incorporated my patched RCCL library into the toolbox, allowing us to split training workloads across multiple machines using either RDMA or standard Ethernet, and how to reproduce the setup using my cluster management scripts. Timestamps 00:00 – Introduction 03:26 – Starting the Toolbox 07:00 – How Unsloth Works (vs PyTorch Autograd) 10:15 – Unsloth Training Demo & Benchmarks 16:50 – Unsloth Patching & Implementation Details 20:28 – Multi-Node Cluster Setup 24:08 – DDP vs FSDP Training Strategies 27:00 – Multi-Node Training Demo 29:53 – Conclusion Links & Resources Strix Halo Toolboxes & Guides: https://strix-halo-toolboxes.com Strix Halo Fine-Tuning Toolbox: https://github.com/kyuz0/amd-strix-ha... LLM Chronicles (Gradient Descent Deep Dive): https://llm-chronicles.com DDP vs. FSDP in PyTorch: https://www.jellyfishtechnologies.com...

Comments