У нас вы можете посмотреть бесплатно In Network Collective acceleration for AI Fabrics или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Surendra Anubolu Distinguished Engineer - Broadcom Inc, Nikhil Shetty Consulting Member of Technical Staff - Oracle For AI-ML (and HPC) workloads, collectives such as All Reduce, ReduceScatter and AllGather are a critical portion of the job completion time. GPU based collectives are more limited by communication bandwidth than the computation. Network switches offer a perfect place to offload as the switches offer a high radix and throughput they can sink. Performing the reduction operator in the switch-fabric (network), can reduce network bandwidth by half compared to what is achievable with GPU based reductions. This additionally enables higher MFU and lower memory footprint at the GPU end points. We will present an INC offload solution for Ethernet networks implemented in a high-performance low latency ethernet switch. We will also share performance measurements and application benefits from the network offload of collective operations. In addition, we will provide an update on the standards activity happening in relevant communities such as UEC (Ultra Ethernet Consortium) and OCP SAI.