У нас вы можете посмотреть бесплатно Braintrust and Box on AI agents and the future of AI observability или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
BrainTrust reveals the game-changing method to make AI agents perform under pressure. Ben Kus, CTO at Box, sits down with Ankur Goyal, CEO of BrainTrust, to discuss the importance of AI agent evaluation and observability. They zoom in on how AI evals are used to test and ensure the accuracy of AI agents, especially in complex environments where non-determinism can complicate results. Ankur shares his journey from developing AI solutions in document processing to building BrainTrust, a tool designed to address these challenges. The discussion also touches on the evolving role of product managers in guiding AI development through better evaluation practices, and how AI agents are becoming a critical part of enterprise systems. Key Moments: The evolution of AI evals: Transitioning from traditional benchmarks to AI-specific evaluations. Non-determinism in AI: Why AI outputs can vary and how to measure accuracy in dynamic environments. AI observability: A deep dive into how AI agents' behavior in production can be monitored and improved. The role of product managers: Shifting from requirements documents to defining successful AI agent behavior. Embracing failure: Why failing evals can be an opportunity to refine AI tools and models. Testing AI agents in production: Practical strategies for evaluating agent performance in real-world enterprise environments. Jump into the conversation: (00:00) Introduction to evaluating AI agents and why LLMs help in evaluation (00:39) Ankur Goyal shares his journey from AI document processing to BrainTrust (02:31) Building BrainTrust to address common AI problems across companies (03:01) Defining evals and how they work in AI, similar to traditional software benchmarking (03:59) The challenge of accuracy in AI versus traditional software systems (04:22) AI's non-determinism and how it affects the output's correctness (05:11) The evolution of AI observability and how it differs from traditional methods (06:43) Unexpected behavior in AI and its relationship to model drift (07:03) Non-determinism and complexity in AI agents' decision-making (07:57) The significance of AI evals as the new PRDs in product management (09:10) Transitioning from simple automation to evaluating more complex AI behaviors (10:32) Evaluating AI agents’ results similar to how people are tested (12:03) AI output evaluation through comparisons, like the Magna Carta example (13:12) Non-determinism's impact on enterprise AI use cases and the importance of careful validation (15:12) Advice on handling non-determinism when working with financial data in AI (17:40) Using multiple paths for validation and the importance of cross-checking results (20:34) Distinguishing marketing evals from internal evals in AI product development (22:12) The critical role of context in evaluating AI output accuracy (24:05) Moving beyond golden datasets to more dynamic evaluation methods (26:03) Internal evals as the cornerstone of reliable AI product development (27:16) The challenge of defining "perfect" datasets and managing unpredictable outputs (29:40) Applying eval principles to enterprise platforms and external AI tools (32:16) Promoting transparency in AI evaluation with vendors and within teams (34:45) Final advice for enterprises to avoid failure when deploying agentic capabilities