У нас вы можете посмотреть бесплатно LLM Eval Tools Compared: Braintrust или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Join the AI Evals Course starting March 16, 2026: https://maven.com/parlance-labs/evals... . Wayde Gilliam from Braintrust demonstrates his evaluation framework approach in part 2 of the Evals Bake-Off series. The expert panel evaluates Braintrust's workflow for building production-ready LLM evaluation systems. Associated blog post: https://hamel.dev/blog/posts/eval-tools/ Playlist for series: • Mystery Data Science Theatre Wayde showcases a unique approach by recruiting his family as subject matter experts to generate real user queries and validate system outputs. The panel discusses critical trade-offs between automation and human oversight, UI design decisions, and the importance of domain expertise in evaluation workflows. Judges & Panel: Bryan Bischof, Head of AI, Theory Ventures Hamel Husain, Independent Developer Shreya Shankar, Data Systems Researcher Topics Covered: Subject matter expert involvement in evaluation design Dataset creation from real user feedback Loop AI agent for automated scoring and optimization Instrumentation and tracing with decorators Custom views and failure mode taxonomy Open coding and axial coding workflows Automation vs. manual review trade-offs UI/UX comparison with other evaluation tools Key Discussion Points: The value of real domain experts vs. synthetic data Risks of premature optimization with AI-generated evaluators When to use automation in the evaluation loop Notebook workflows vs. custom UI tools Stacking abstractions in evaluation systems The importance of objective functions in prompt optimization Custom visualization for failure mode analysis Timestamps: 00:00 - Introduction to Braintrust Review 01:05 - Family as Subject Matter Experts Approach 03:01 - System Prompt Development in Playground 05:11 - Loop AI Agent for Automated Scoring 06:42 - Panel Critique: Premature Automation Risks 10:04 - Real User Data vs. Synthetic Generation 13:38 - UI/UX Design Comparison 15:09 - Homework 2: Synthetic Data Process 22:13 - Application Instrumentation with Decorators 27:03 - Trace Viewing and Analysis 28:43 - Open Coding with Subject Matter Experts 32:02 - Custom Failure Mode Taxonomy Views 34:20 - Notebooks vs. UI Tools Debate 38:04 - Final Assessment and Tool Comparison