Скачать с ютуб видео ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor в качестве 4k

У нас вы можете посмотреть бесплатно ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru

ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor

The explosive growth of large language models (LLMs) has facilitated a significant number of breakthroughs in fields like text analysis, language translation, and chatbot technologies. However, the deployment of LLMs presents a formidable challenge due to their large parameter (e.g., over 700GB memory required to run BLOOM-176B model in FP32), making them impractical to run on commodity hardware. Users, therefore, have an ongoing demand for methods of compressing LLMs that maintain comparable accuracy while reducing their memory footprint, for which general quantization recipes may not work. To compress LLMs with reasonable accuracy, Intel® Neural Compressor integrates as well as enhances SmoothQuant algorithm, which effectively addresses the compression challenge by efficiently compensating for the accuracy loss introduced by activation quantization. Our team has validated the efficacy of this solution on numerous LLMs such as GPT-J, LLaMA, and BLOOM, achieving promising latency on Intel hardware. Furthermore, Intel® Neural Compressor eliminates the gap that exists in exporting int8 PyTorch models to ONNX format, making it ideal for production deployment. We continue to upload ONNX models to the ONNX model zoo and Hugging Face hub (e.g., GPT-J and Whisper-large), which can make contributions to the ONNX community.

Comments

ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor скачать в хорошем качестве

скачать видео

скачать mp3

скачать mp4

поделиться

телефон с камерой

телефон с видео

бесплатно

загрузить,

Скачать видео с ютуб по ссылке или смотреть без блокировок на сайте: ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor в качестве 4k

Информация по загрузке:

Скачать mp3 с ютуба отдельным файлом. Бесплатный рингтон ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor в формате MP3:

ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor