Русские видео

Сейчас в тренде

Иностранные видео


Скачать с ютуб Deep Dive: Quantizing Large Language Models, part 1 в хорошем качестве

Deep Dive: Quantizing Large Language Models, part 1 11 месяцев назад


Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru



Deep Dive: Quantizing Large Language Models, part 1

Quantization is an excellent technique to compress Large Language Models (LLM) and accelerate their inference. In this video, we discuss model quantization, first introducing what it is, and how to get an intuition of rescaling and the problems it creates. Then we introduce the different types of quantization: dynamic post-training quantization, static post-training quantization, and quantization-aware training. Finally, we start looking at and comparing actual quantization techniques: PyTorch, ZeroQuant, and bitsandbytes. In part 2    • Deep Dive: Quantizing Large Language ...  , we look at and compare more advanced quantization techniques: SmoothQuant, GPTQ, AWQ, HQQ, and the Hugging Face Optimum Intel library based on Intel Neural Compressor and Intel OpenVINO. Slides: https://fr.slideshare.net/slideshow/j... ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at   / julsimon   or Substack at https://julsimon.substack.com. ⭐️⭐️⭐️ 00:00 Introduction 02:05 What is quantization? 06:50 Rescaling weights and activations 08:17 The mapping function 12:38 Picking the input range 16:15 Getting rid of outliers 19:50 When can we apply quantization? 26:00 Dynamic post-training quantization with PyTorch 28:42 ZeroQuant 34:50 bitsandbytes

Comments