У нас вы можете посмотреть бесплатно Evaluating AI Agents via "Trajectory Evals" & "Eval Agents" | w/ Dhruv Singh Co-Founder @ HoneyHive или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
An in-depth conversation on GenAI evaluations with Evals expert guest, Dhruv Singh, CTO & Co-Founder of HoneyHive AI Summary: Reid Mayo, Founding AI Engineer of OpenPipe (YC23), and Dhruv Singh ,CTO of HoneyHive AI, discuss the complexities and importance of evaluations (evals) in LLM-backed AI applications. They explore the challenges of automated evaluation, the significance of establishing performance expectations, and the necessity of implementing evals early in the development process. The discussion also covers various types of evals, including sanity checks and cascading evaluations, and emphasizes the need for a structured approach to ensure the reliability of AI systems. Going deeper into sophisticated evaluation techniques in the second half, Reid and Dhruv discuss the complexities of evaluating agentic AI systems, particularly focusing on trajectory evaluations and the challenges of productionizing AI agents. They explore the concept of simulations in testing AI agent performance and the need for robust evaluation pipelines that align AI outputs with human judgments. The discussion wraps up by surfacing additional resources available for learning about effective GenAI evaluation strategies. Chapters: 00:00 Introduction to AI Evaluations and Challenges 06:58 Understanding Evals: Importance and Definitions 12:10 Eval Driven Development: When to Implement Evals 15:49 Types of Evals: Fundamental Approaches 20:57 Cascading Evals: Single Step vs Multi-Step Workflows 27:05 Understanding Trajectory Evaluations in AI 30:20 The Complexity of Productionizing AI Systems 33:36 Simulations: Testing AI in Controlled Environments 35:53 The Future of Evaluation Metrics in AI 37:12 Building Robust Evaluation Pipelines 40:29 The Role of Explanations in AI Evaluations 46:07 Aligning AI Outputs with Human Judgments 49:17 Resources for Learning About AI Evaluations