У нас вы можете посмотреть бесплатно The Hardest Problem in AI: Evaluation in 2025 with Ian Cairns или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this episode of the ODSC AI Podcast, host Sheamus McGovern speaks with Ian Cairns, cofounder and CEO of Freeplay, a platform built to help teams evaluate, monitor, and iterate on LLM and agent-based systems in production. Ian brings a deep product background from Twitter, Gnip, and Mapbox, and offers an insider’s look into what it actually takes to make AI work beyond the prototype phase. The conversation centers on evaluation — widely regarded as one of the most difficult and underdeveloped aspects of deploying AI in 2025. Key Topics Covered: 1- The real-world AI maturity curve: from vibe prompting to production 2- Offline vs. online evaluation: definitions, trade-offs, and tooling 3- Why teams struggle post-deployment — and how to break through the “we don’t know what’s going wrong” phase 4- Evaluation challenges with agents, memory, RAG, and tool use 5- The role of observability, telemetry, and human-in-the-loop review 6- Lessons learned from Freeplay customers, including Postscript 7- The growing importance of domain experts in evaluation workflows 8- Building multi-layer eval architectures for agent systems 9- Voice agent challenges — like turn detection and latency 10- Emerging roles like AI Evaluation Engineer and how orgs should staff for evaluation maturity