У нас вы можете посмотреть бесплатно Building eval systems that improve your AI product или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
If you’re a premium subscriber, add the private feed to your podcast app at https://add.lennysreads.com In this episode, we dive into the fast-emerging discipline of AI evaluation with Hamel Husain and Shreya Shankar, creators of AI Evals for Engineers & PMs, the #1 highest-grossing course on Maven. After training 2000+ PMs and engineers across 500+ companies, Hamel and Shreya reveal the complete playbook for building evaluations that actually improve your AI product: moving beyond vanity dashboards, to a system that drives continuous improvement. In this episode, you’ll learn: • Why most AI eval dashboards fail to deliver real product improvements • How to use error analysis to uncover your product’s most critical failure modes • The role of a “principal domain expert” in setting a consistent quality bar • Techniques for transforming messy error notes into a clean taxonomy of failures • When to use code-based checks vs. LLM-as-a-judge evaluators • How to build trust in your evals with human-labeled ground-truth datasets • Why binary pass/fail labels outperform Likert scales in practice • Evaluation strategies for complex systems: multi-turn conversations, RAG pipelines, and agentic workflows • How CI safety nets and production monitoring work together to create a flywheel of continuous product improvement References: • Read the newsletter: https://www.lennysnewsletter.com/p/bu... • AI Evals for Engineers & PMs: https://maven.com/parlance-labs/evals • A Field Guide to Rapidly Improving AI Products: https://hamel.dev/blog/posts/field-gu... • Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences: https://arxiv.org/abs/2404.12272 • Aman Khan: / amanberkeley • Anthropic: https://www.anthropic.com/ • Arize Phoenix: https://phoenix.arize.com/ • Braintrust: https://www.braintrust.dev/ • Beyond vibe checks: A PM’s complete guide to evals: https://www.lennysnewsletter.com/p/be... • Frequently Asked Questions (And Answers) About AI Evals: https://hamel.dev/blog/posts/evals-faq/ • Hamel Husain: / hamelhusain • LangSmith: https://smith.langchain.com/ • Not Dead Yet: On RAG: https://hamel.dev/notes/llm/rag/not_d... • OpenAI: https://openai.com/ • Shreya Shankar: / shrshnk Listen: • YouTube: / @lennysreads • Apple: https://podcasts.apple.com/us/podcast... • Spotify: https://open.spotify.com/show/0IIunA0... • Newsletter: https://www.lennysnewsletter.com/subs... Follow Lenny: • Twitter/X: / lennysan • LinkedIn: / lennyrachitsky • Podcast: / @lennyspodcast Subscribe • YouTube: / @lennysreads • Apple: https://podcasts.apple.com/us/podcast... • Spotify: https://open.spotify.com/show/0IIunA0... • Substack: https://lennysreads.com/ Follow Lenny • Twitter: / lennysan • LinkedIn: / lennyrachitsky • Podcast: / @lennyspodcast About Welcome to Lenny's Reads, where every week you’ll find a fresh audio version of my newsletter about building product, driving growth, and accelerating your career, read to you by the soothing voice of Lennybot.