У нас вы можете посмотреть бесплатно AI Book Club: AI Systems Performance Engineering 📱 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
February's book is "AI Systems Performance Engineering"!This is a casual-style event. Not a structured presentation on topics. Join live events: https://luma.com/ai-builders-and-lear... Sage on Linkedin: / sageelliott Book: https://learning.oreilly.com/library/... Slides: https://docs.google.com/presentation/... Sometimes, the discussion even drifts away from the chapters, but feel free to grab the mic to help steer it back.Feel free to join the discussion even if you have not read the book chapters! :) Want to discuss the contents during the reading week? Join the Slack Flyte MLOps Slack group and search for the "ai-reading-club" channel. https://slack.flyte.org/ ------------------------------------------------- About the book:Title: AI Systems Performance EngineeringAuthors: Chris FreglyPublished: November 2025 https://learning.oreilly.com/library/... Chapters:1. Introduction and AI System Overview 2. AI System Hardware Overview 3. OS, Docker, and Kubernetes Tuning for GPU-based Environments 4. Tuning Distributed Networking Communication 5. GPU-Based Storage I/O Optimizations 6. GPU Architecture, CUDA Programming, and Maximizing Occupancy 7. Profiling and Tuning GPU Memory Access Patterns8. Occupancy Tuning, Warp Efficiency, and Instruction-Level Parallelism 9. Increasing CUDA Kernel Efficiency and Arithmetic Intensity 10. Intra-Kernel Pipelining, Warp Specialization, and Cooperative Thread Block Clusters 11. Inter-Kernel Pipelining, Synchronization, and CUDA Stream-Ordered Memory Allocations 12. Dynamic Scheduling, CUDA Graphs, and Device-Initiated Kernel Orchestration 13. Profiling, Tuning, and Scaling PyTorch 14. PyTorch Compiler, OpenAI Triton, and XLA Backends 15. Multinode Inference, Parallelism, Decoding, and Routing Optimizations 16. Profiling, Debugging, and Tuning Inference at Scale 17. Scaling Disaggregated Prefill and Decode for Inference 18. Advanced Prefill-Decode and KV Cache Tuning 19. Dynamic and Adaptive Inference Engine Optimizations 20. AI-Assisted Performance Optimizations and Scaling Toward Multimillion GPU ClustersBook Description https://learning.oreilly.com/library/...