У нас вы можете посмотреть бесплатно QLoRA: Efficient Finetuning of Quantized Large Language Models (Tim Dettmers) или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Recent open-source large language models (LLMs) like LLaMA and Falcon are both high-quality and provide strong performance for their memory footprint. However, finetuning these LLMs is still challenging on consumer and mobile devices with a 32B LLaMA model requiring 384 GB of GPU memory for finetuning. In this talk, I introduce QLoRA, a technique that reduces the finetuning requirement of LLMs by roughly 17 times, making a 32B LLM finetunable on 24 GB consumer GPUs and 7B language models finetunable on mobile devices. The talk provides a self-contained introduction on quantization and discusses the critical factors which allow QLoRA to use 4-bit for LLM finetuning while still replicating full 16-bit finetuning performance. I also discuss the evaluation of LLMs and how we used insights from our LLM evaluation study to build one the most powerful open-source chatbots, Guanaco. Speakers Bios (Tim Dettmers): Tim is PhD student at the University of Washington advised by Luke Zettlemoyer, working on efficient deep learning to make training, fine-tuning, and inference of deep learning models more accessible in particular to those with the least resources. Tim is the maintainer of the bitsandbytes, a widely used machine learning library for 4-bit and 8-bit quantization with 200k pip installations per month. He has a background in applied math and industry automation. / tim_dettmers *** Hosted by Denys Linkov and MLOps Discord community: / discord