У нас вы можете посмотреть бесплатно Lessons from building evals in Healthcare at Scale with Clara Matos или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
AI is transforming healthcare delivery by enabling more personalized, effective and efficient care at scale. However, deploying these models in a highly regulated, safety-critical environment introduces unique challenges, especially when it comes to ensuring consistency, reliability, and alignment with clinical standards. In this practical, example-driven talk, Clara shares how to evaluate Health AI products throughout their lifecycle: from development to deployment and continuous improvement in production. 00:15 Why evals matter in healthcare (safety, reliability, usefulness) 02:52 Why evals are crucial for deploying LLMs in production 03:19 Benefits: iteration, regression detection, model comparison, cost savings 04:11 Offline eval loop: dataset → evaluators → V-check → release 05:01 Building eval datasets: not just random samples (rare + hard + regressions + tool/RAG branches) 06:05 What an eval item looks like (context + transcript) + continuous dataset updates 06:36 Defining criteria: binary outputs (pass/fail) to reduce ambiguity 07:09 Human annotation for expert alignment (clinical/product reviewers) 08:02 Building evaluators: human vs code-based vs LLM-as-judge 08:47 Human pairwise comparisons + internal tooling for release decisions 09:15 Code-based evaluators: scalable checks (example: character limits) 09:48 LLM-as-judge: prompt design + train/dev/test split for alignment 11:21 Iterating on judge prompts: disagreement analysis → refine → validate on test set 12:41 Why manual V-check still matters (metrics can improve but quality can regress) 14:08 Online evals: continuous evaluation in production 14:29 Guardrails as real-time safety checks (code + LLM judge for critical cases) 15:42 A/B tests in LLM products: metrics, lift, stakeholder patience, statistical constraints 17:03 Manual audits: high-ROI pattern detection + root cause analysis 17:38 Observability: log inputs/metadata/outputs/RAG docs/tools/feedback/evals 18:16 Running evaluators on production traces + alerting on pass-rate drops 19:10 Closing: evals are continuous — systems must improve with every interaction Thank you to all our partner to make this happen! A big thanks to our gold sponsors for believing in us: Uphold: Founded in 2013, Uphold is a digital wallet and trading platform that makes cryptocurrencies and other assets affordable and accessible to everyone. With coverage of 300+ assets, Uphold allows users to move seamlessly between digital and traditional currencies, enabling borderless access to financial services you can’t get through your bank. Their Anything-to-Anything interface lets anyone fund, trade, or send money globally in just one tap. Check it here: https://uphold.com Datalinks: DataLinks recognizes the importance of nurturing the AI ecosystem. They bring ontologies and knowledge graphs to Lisbon AI, redefining data engineering and shaping the future of Agentic Workflows and Vertical Search. Discover how scattered data can be unified and linked to power your agents and backends, all with a single click and some prompts: https://datalinks.com/ Follow Clara: https://x.com/clarafrmatos Follow @sword_health: https://x.com/SwordHealth Follow us on X: https://x.com/lisbonai_ Follow us on LinkedIn: / lisbon-ai Opening music: @NIN