У нас вы можете посмотреть бесплатно NVIDIA's Jet Nemotron - Post Neural Architecture Search & JetBlock или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In depth discussion here: https://open.spotify.com/episode/4pLm... documentation here: https://www.nvidia.com/en-us/ai-data-... NVIDIA's new Jet-Nemotron model family, which introduces a hybrid-architecture approach to Large Language Models (LLMs) to significantly improve efficiency without sacrificing accuracy. This innovation is primarily driven by two key technologies: Post Neural Architecture Search (PostNAS), a method for "retrofitting" existing models to identify and replace less critical full-attention layers with more efficient ones, and JetBlock, a novel linear attention module. The core idea is that not all attention layers are equally important, allowing for a drastic reduction in the Key-Value (KV) Cache size, leading to up to a 53.6x increase in decoding throughput and a 98% potential cost reduction for inference. Jet-Nemotron aims to set a new standard for LLM evaluation, emphasizing real-world performance and hardware efficiency across a range of devices, from data centers to edge devices, making high-performance AI more economically viable and accessible.