У нас вы можете посмотреть бесплатно SkillsBench: New Benchmark for LLM Agent Skills или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this AI Research Roundup episode, Alex discusses the paper: 'SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks' SkillsBench is the first comprehensive benchmark designed to evaluate how structured procedural knowledge augments LLM agents across 86 diverse tasks. The researchers tested 7,308 trajectories to compare curated skills against self-generated ones, finding that curated instructions significantly boost success rates. Notably, smaller models with access to these skills can match the performance of much larger models that lack them. However, the study reveals that LLMs cannot yet reliably author their own skills, as self-generated procedural knowledge provided no performance benefit. This work provides a crucial standard for measuring the effectiveness of the procedural tools used to expand LLM capabilities. Paper URL: https://arxiv.org/pdf/2602.12670 #AI #MachineLearning #DeepLearning #LLMAgents #SkillsBench #ProceduralKnowledge #Benchmarks