У нас вы можете посмотреть бесплатно AI Token Economics: Real Costs of Running Models in 2026 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
How are the economics of AI changing? Why are headlines saying that 95% of AI projects are failing? In this video, Val Bercovici, AI Strategist at WEKA, explains the hidden costs of running AI models in production, the GPU memory bottleneck problem, and how enterprises can optimize their inference costs. During this talk from the AI Infrastructure Summit 2025, Val speaks about the critical tradeoff between FLOPS (floating point operations per second) and memory in GPU computing. He reveals why token costs are becoming the determining factor between successful and failed AI implementations, with developers unable to afford the tokens they need even at $2,000/month. The conversation explores how GPU prefill creates the biggest bottleneck in AI inference, why Nvidia pre-announced a processor 18 months in advance specifically for this problem, and the concept of a token warehouse™ that could revolutionize how AI models handle context windows and KV cache. Val discusses how WEKA's software-defined approach supports Nvidia, AMD, and hybrid cloud deployments, allowing enterprises to be "Switzerland" in the GPU vendor competition while optimizing their infrastructure costs and energy consumption. Looking ahead to 2026, Val predicts AI agents will evolve from supervised interns requiring constant oversight to autonomous employees that make decisions independently. He also addresses the timeline for quantum computing's impact on AI, explaining how AI is currently accelerating quantum development in a virtuous cycle that won't fully materialize for another five to 15 years. Key Topics Covered: • Why token economics (or, “tokenomics”) determines AI project success or failure • The memory wall problem in GPU computing and AI inference • How prompt caching optimization reduces input and output token costs • GPU prefill bottlenecks and the KV cache decode process • Multi-vendor hardware strategy: Nvidia vs AMD for training and inference • Managing ROI and cash flow as a Chief AI Officer in enterprise AI • AGI predictions and the evolution of autonomous AI agents by 2026 • Quantum computing timeline and its future impact on AI acceleration • Energy costs and GPU scarcity in AI data centers About WEKA: WEKA provides high-performance data infrastructure for AI, machine learning, and GPU-accelerated workloads. Our software-defined storage system delivers the speed and scalability enterprises need for production AI deployments across cloud, on-premises, and hybrid environments. 🔗 Learn how WEKA solves AI infrastructure challenges: https://www.weka.io/resources/solutio... 👉 Connect with WEKA: Website: https://www.weka.io?utm_source=youtub... LinkedIn: https://www.linkedin.com/company/weka... X: https://x.com/weka?utm_source=youtube... #AIInfrastructure #TokenEconomics #GPUComputing #AIInference #MachineLearning #WEKA