У нас вы можете посмотреть бесплатно Haystack EU 2025: From LLM-as-a-Judge to Human-in-the-Loop: Rethinking Evaluation in RAG and Search или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
More: https://haystackconf.com/eu2025/talk-8/ Speaker: Fernando Rejon Barrera & Daniel Wrigley Everyone’s using LLMs as judges. In this talk, we’ll explore techniques for LLM-as-a-judge evaluation in Retrieval-Augmented Generation (RAG) systems, where prompts, filters, and retrieval strategies create endless variations. This begs the question, but how do you evaluate the judges? ELO rankings in chess are a system that calculates the relative skill levels of players based on their game results, with higher ratings indicating stronger players. We introduce RAGElo, an ELO-style ranking framework that uses LLMs to compare outputs without needing gold answers - bringing structure to subjective judgments at scale. Then we showcase the integration of RAGElo into the Search Relevance Workbench, released in OpenSearch 3: a human-in-the-loop toolkit that lets you dig deep into search results, compare configurations, and spot issues metrics miss. Together, these tools balance automation and intuition - helping you build better retrieval and generation systems with confidence. Haystack is an event by OpenSource Connections. Website: https://opensourceconnections.com/ Mail: hello@o19s.com Mastodon: https://fosstodon.org/@o19s Linkedin: / opensource-connections