У нас вы можете посмотреть бесплатно Efficient LLM Inference with SGLang, Lianmin Zheng, xAI или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this Advancing AI 2024 Luminary Developer Keynote, Dr. Lianmin Zheng introduces SGLang, a high-performance serving framework optimized for inference with LLMs and vision-language models. SGLang’s core techniques include RadixAttention for improved KV cache reuse and jump-forward decoding for faster grammar-guided decoding. Additional optimizations, such as low-overhead CPU scheduling and torch native enhancements (e.g., torch.compile and torchao), further enhance efficiency. Benchmark results demonstrate that SGLang achieves superior performance compared to other state-of-the-art inference engines. As an open-source project with broad adoption, SGLang is also deployed for production serving at xAI. Speaker: Lianmin Zheng, xAI Gain access to AMD developer tools and resources. https://www.amd.com/en/developer.html... The information contained in this video represents the view of AMD or the third-party presenter as of the date presented. AMD and/or the third-party presenters have no obligation to update any forward-looking content in the above presentations. AMD is not responsible for the content of any third-party presentations and does not necessarily endorse the comments made therein. GD-84. © 2024 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.