Русские видео

Сейчас в тренде

Иностранные видео


Скачать с ютуб vLLM: Virtual LLM в хорошем качестве

vLLM: Virtual LLM Трансляция закончилась 2 месяца назад


Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru



vLLM: Virtual LLM

Join us as we dive into vLLM, the fastest and most efficient open-source LLM inference and serving engine! We'll break down how vLLM redefines optimization—not by altering model weights with quantization techniques like GPTQ or AWQ, but by revolutionizing memory management and throughput. Learn how innovations like PagedAttention integrate with FlashAttention, how vLLM tackles memory fragmentation, and why it’s the ultimate smart library system for your AI workflows. Whether you're building high-performance LLM apps or exploring cutting-edge research, this session will equip you with actionable insights to level up your projects! Join us every Wednesday at 1pm EST for our live events. SUBSCRIBE NOW to get notified! Speakers: ​Dr. Greg, Co-Founder & CEO AI Makerspace   / gregloughane   The Wiz, Co-Founder & CTO AI Makerspace   / csalexiuk   Apply for The AI Engineering Bootcamp on Maven today! https://bit.ly/AIEbootcamp LLM Foundations - Email-based course https://aimakerspace.io/llm-foundations/ For team leaders, check out! https://aimakerspace.io/gen-ai-upskil... Join our community to start building, shipping, and sharing with us today!   / discord   How'd we do? Share your feedback and suggestions for future events. https://forms.gle/z96cKbg3epXXqwtG6 #inference #serverroom 00:00:00 Introduction to Virtual LLMs 00:03:49 Understanding VM: The Future of High-Throughput Memory Systems 00:07:55 Efficient Inference and Serving Tools 00:11:43 Understanding Inference Engines and Servers 00:15:41 Challenges in LLM Performance 00:19:16 Efficient Memory Caching in Key-Value Systems 00:23:36 Understanding Attention Layers in Deep Learning 00:27:23 Optimizing KV Cache with Memory Management Techniques 00:31:14 Understanding Memory Fragmentation and Allocation 00:35:25 Exploring the Future of Artificial Intelligence 00:39:23 Optimizing GPU and Memory Utilization for AI Models 00:42:59 Optimizing Offline Model Inference 00:46:24 Optimizing GPU Performance with Tensor Parallelism 00:50:05 VM: Fast, Easy, and Efficient Serving for LLMs 00:53:29 Advances in Kernel Fusion for GPU Optimization 00:57:01 Optimizing GPU Utilization with Latest Tools 01:00:34 Building and Shipping Through the Holidays

Comments