У нас вы можете посмотреть бесплатно Humanity's Last Exam или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
This video explores Humanity’s Last Exam (HLE), an ambitious project designed to test if artificial intelligence can match the highest levels of human expertise. As existing benchmarks like MMLU become saturated, HLE moves the goalposts to postgraduate and postdoctoral reasoning. Inside the Video: The New Gold Standard: Why the MMLU is becoming trivia as AI surpasses the human expert ceiling. Crowdsourced Intelligence: Developed by the Center for AI Safety and Scale AI, the exam features over 2,500 original, closed-ended questions. Beyond Memorization: Discover how HLE uses "radioactive" UUID tags to prevent models from training on test data, ensuring we measure reasoning, not memory. The Hard Science Focus: A breakdown of the exam's disciplines, which lean heavily into Mathematics (42%), Physics (11%), and Biology/Medicine (11%). Confidently Wrong: An analysis of how frontier models like GPT-4o score less than 10% in strict settings and exhibit calibration errors above 80%. The Scientific Audit: A look at the FutureHouse audit, which revealed that nearly 30% of Biology and Chemistry answers were disputed or contradicted by literature. The Living Benchmark: How the HLE team is transitioning to a rolling revision process that mirrors the scientific method. The Existential Link: How the benchmark models the "Time to Failure" for civilization, estimating an AI-related failure mean of 40 years. Key Takeaway: True Human-Level AI (HLAI) isn't about passing a static exam; it’s about adaptability and the ability to navigate a fuzzy, ambiguous research frontier where the "correct" answer is often a matter of debate. #ArtificialIntelligence #AIBenchmark #HLE #MachineLearning #GPT4o #AIReasoning #FutureOfTech #ScienceEthics #HumanityLastExam