У нас вы можете посмотреть бесплатно #261 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, the authors systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, they show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, LRMs exhibit a counterintuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, they identify three performance regimes: (1) low-complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse. They found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across scales and problems. They also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and ultimately raising questions about their reasoning capabilities. In this video, I talk about the following: Puzzle-based thinking problems, Accuracy of LRMs vs Compute Budget for problems of varying complexity, Overthinking and wasteful thinking in LRMs, Providing the algorithmic solution does not help. For more details, please look at https://arxiv.org/pdf/2506.06941 Shojaee, Parshin, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar. "The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity." arXiv preprint arXiv:2506.06941 (2025). Thanks for watching! LinkedIn: http://aka.ms/manishgupta HomePage: https://sites.google.com/view/manishg/