У нас вы можете посмотреть бесплатно What a 100-year-old horse teaches us about AI или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
How do we rigorously measure AI's intelligence? We don't really know. What we know is that measuring intelligence is tricky, and if we're not careful, our tests might not measure what we intend. We explore this topic by starting with the story of Clever Hans, a horse who seemingly could do arithmetic. Later, we explain the potential limitations of today's AI benchmarks and how we could do better by looking at the established discipline of cognitive science. ▀▀▀▀▀▀▀▀▀SOURCES & READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ The Project Gutenberg EBook of Clever Hans, by Oskar Pfungst: https://www.gutenberg.org/files/33936... The Wiring of Intelligence: https://journals.sagepub.com/doi/10.1... New and emerging models of human intelligence: https://wires.onlinelibrary.wiley.com... NTIRE2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results: https://arxiv.org/pdf/2504.10685v1 HellaSwag: https://rowanzellers.com/hellaswag/ Are We Done with MMLU? https://arxiv.org/abs/2406.04127 Artificial cognition: How experimental psychology can help generate explainable artificial intelligence: https://link.springer.com/article/10.... o3-mini System Card: https://cdn.openai.com/o3-mini-system... Measuring Massive Multitask Language Understanding: https://arxiv.org/pdf/2009.03300 Requiem for nutrition as the cause of IQ gains: Raven's gains in Britain 1938–2008: https://www.sciencedirect.com/science... Observational Scaling Laws and the Predictability of Language Model Performance: https://doi.org/10.48550/arXiv.2405.1... Introducing Claude 4 (agentic benchmarks): https://www.anthropic.com/news/claude-4 Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses: https://turntrout.com/original-truthf... Phonological memory and vocabulary development during the early school years: A longitudinal study: https://psycnet.apa.org/doi/10.1037/0... MMLU-CF:AContamination-free Multi-task Language Understanding Benchmark: https://arxiv.org/pdf/2412.15194 Smelling themselves: Dogs investigate their own odours longer when modified in an “olfactory mirror” test: https://doi.org/10.1016/j.beproc.2017... Elephants' jumbo mirror ability: http://news.bbc.co.uk/2/hi/science/na... ARC Prize 2024: Technical Report: https://arxiv.org/pdf/2412.04604 Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others: https://arxiv.org/pdf/2102.11938v1 CogBench: a large language model walks into a psychology lab: https://arxiv.org/pdf/2402.18225 The Animal-AI Environment: A virtual laboratory for comparative cognition and artificial intelligence research: https://doi.org/10.3758/s13428-025-02... A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment: https://openreview.net/forum?id=eUkbT... ▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, MERCH▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🟠 Patreon: / rationalanimations 🔵 Channel membership: / @rationalanimations 🟢 Merch: https://rational-animations-shop.four... 🟤 Ko-fi, for one-time and recurring donations: https://ko-fi.com/rationalanimations ▀▀▀▀▀▀▀▀▀SOCIAL & DISCORD▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ Rational Animations Discord: / discord Reddit: / rationalanimations X/Twitter: / rationalanimat1 Instagram: / rationalanimations ▀▀▀▀▀▀▀▀▀PATRONS & MEMBERS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ Thanks to our patrons and channel members from the Simple Adder tier and above: https://docs.google.com/document/d/1p... ▀▀▀▀▀▀▀CREDITS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ Credits here: https://docs.google.com/document/d/1d...