У нас вы можете посмотреть бесплатно Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
The study "Hunt Instead of Wait" addresses a significant gap in evaluating Agentic Large Language Models by distinguishing between executional intelligence, where models complete pre-defined tasks, and investigatory intelligence, which requires autonomous goal-setting and data exploration without explicit user queries. To rigorously measure this capability, the authors introduce the Deep Data Research (DDR) framework and DDR-Bench, a large-scale benchmark that tasks agents with autonomously navigating complex databases—such as electronic health records, financial filings, and longitudinal behavioral data—to derive meaningful insights using tools like SQL and Python. Unlike traditional methods that rely on subjective judgments, this approach employs an objective checklist-based evaluation system to verify the factual accuracy of the insights generated against ground-truth data. The findings reveal that while frontier models like Claude 4.5 Sonnet exhibit emerging agentic behaviors and outperform peers, current systems still struggle with long-horizon exploration and effective self-termination. Ultimately, the analysis suggests that advancing investigatory intelligence depends less on merely scaling model size and more on developing intrinsic strategies that balance broad data coverage with focused reasoning during extended interactions. https://arxiv.org/pdf/2602.02039