У нас вы можете посмотреть бесплатно Gyeong-In Yu - Scaling Generative AI Inference at Trillion-Token Scale - SuperAI Singapore 2025 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Learn more about SuperAI: superai.com Follow us on X: x.com/superai_conf Keynote: Scaling Generative AI Inference at Trillion-Token Scale Speaker: Gyeong-In Yu, CTO @ FriendliAI Stage: WEKA Stage #superai #friendliai #genai #inference #aiinfrastructure Recorded on 18 June 2025 Join Gyeong-In Yu, CTO of FriendliAI, as he explores the intricate challenges and innovative solutions surrounding generative AI inference at a trillion-token scale. As AI technology advances rapidly, the demand for efficient and scalable AI inference grows exponentially. This session delves into the factors driving this demand, including the rise of agentic AI and the shift towards utilizing GPUs primarily for inference rather than training. Yu unveils key strategies to tackle the issues of increasing computational needs and infrastructure management. He emphasizes the necessity of reducing GPU costs and employing specialized optimizations tailored to different AI applications. By prioritizing user satisfaction with rapid response times and maintaining high-generation quality, FriendliAI aims to create a balance between operational efficiency and superior service delivery. You will gain valuable insights into advanced optimization techniques such as batching, quantization, and caching that enhance computational efficiency. Yu introduces FriendliAI's proprietary FP8 quantization method that upholds model quality while accelerating execution. Furthermore, he showcases Friendli Suite, a robust platform offering services like Friendli Containers and managed cloud solutions that deliver cost-effective and high-performance AI inference.