У нас вы можете посмотреть бесплатно Optimizing vLLM Performance through Quantization | Ray Summit 2024 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
At Ray Summit 2024, Michael Goin and Robert Shaw from Neural Magic delve into the world of model quantization for vLLM deployments. Their presentation focuses on vLLM's support for various quantization methods, including FP8, INT8, and INT4, which are crucial for reducing memory usage and enhancing generation speed. In the talk, Goin and Shaw explain the internal mechanisms of how vLLM leverages quantization to accelerate models. They also provide practical guidance on applying these quantization techniques to custom models using vLLM's llm-compressor framework. This talk offers valuable insights for developers and organizations looking to optimize their LLM deployments, balancing performance and resource efficiency in large-scale AI applications. -- Interested in more? Watch the full Day 1 Keynote: • Ray Summit 2024 Keynote Day 1 | Where Buil... Watch the full Day 2 Keynote • Ray Summit 2024 Keynote Day 2 | Where Buil... -- 🔗 Connect with us: Subscribe to our YouTube channel: / @anyscale Twitter: https://x.com/anyscalecompute LinkedIn: / joinanyscale Website: https://www.anyscale.com