У нас вы можете посмотреть бесплатно Introducing NVIDIA Dynamo: A Distributed Inference Serving Framework for Reasoning models или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
This session will introduce NVIDIA Dynamo, a new inference serving framework designed to deploy reasoning large language models (LLMs) in multi-node environments. We’ll explore the key components and architecture of the new framework, highlighting how they enable seamless scaling within data centers and drive advanced inference optimization. Additionally, we’ll cover cutting-edge inference serving techniques, including disaggregated serving, which optimizes request handling by separating prefill and decode, increasing the number of inference requests served. You will also learn how they will be able to quickly deploy this innovative serving framework using NVIDIA NIM. Speakers: Harry Kim, Principal Product Manager, NVIDIA Neelay Shah, Principal Software Architect, NVIDIA Ryan Olson, Distinguished Engineer / Solutions Architect, NVIDIA Tanmay Verma, Senior System Software Engineer, NVIDIA Replay of NVIDIA GTC Session ID S73042. Level: Technical – Advanced NVIDIA technology: TensorRT, DALI, NVLink / NVSwitch, and Triton Login and join the free NVIDIA Developer Program to download the PDF: https://www.nvidia.com/en-us/on-deman... Find more #GTC25 sessions via NVIDIA on demand: https://www.nvidia.com/en-us/on-deman...