У нас вы можете посмотреть бесплатно Building Accountable AI Agents | Savio Fernandes, PayPal или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
A single AI mistake can cost millions — or delete your entire database. In this NIAT Tech Decode Masterclass, Savio Fernandes (Staff Machine Learning Engineer, PayPal) explains how to evaluate AI agents before they fail — using real-world metrics, frameworks, and feedback loops that make autonomous systems safe, reliable, and production-ready. We’ve moved beyond testing prompts. Today’s AI agents reason, plan, and act — and they need to be evaluated like systems, not chatbots. This session breaks down exactly how industry teams at PayPal, OpenAI, and Anthropic measure agent behaviour, performance, and risk. Here’s what you’ll learn: -Agent Evaluation vs. LLM Evaluation: Why testing an LLM’s answers isn’t enough — and how agent evaluation checks behaviour, tool use, and decisions. -Failure Taxonomy: How to identify cognitive, operational, alignment, and business failures before they reach production. -Core Metrics: Track task success rate, tool usage efficiency, latency, cost, and human feedback signals with structured scoring rubrics. -Trace-Based Evaluation: Use LangSmith to visualise every step of your agent’s reasoning — from input → tool → output — and debug with precision. -Automated Evaluation Pipelines: Integrate tools like DeepEval, Ragas, and OpenAI Evals for large-scale testing and continuous monitoring. -Feedback Loops: Build human-in-the-loop systems that log, learn, and improve with every user interaction. -Advanced Topics: Multi-agent evaluation (handoff smoothness, deadlock detection), governance, bias auditing, and compliance tracking. Whether you’re building a travel agent, customer support bot, or enterprise AI assistant, this session teaches how to: • Measure what really matters — reliability, safety, and business success. • Turn evaluation into an ongoing system, not a one-time checklist. • Use data-driven feedback loops to keep your AI agents grounded and compliant. Because AI isn’t just about intelligence anymore. It’s about accountability. Link to code files: https://drive.google.com/drive/folder... #AIAgents #LLM #AgentEvaluation #LangSmith #DeepEval #AITrust #AIMetrics #AI #ArtificialIntelligence #MachineLearning #AIAudit #AIEthics #AgentSafety #AgentBenchmarks #FeedbackLoops #AIEvaluation #AgentTesting #AIDeployment #TechDecode