У нас вы можете посмотреть бесплатно GPT-5.4 Got the Best Score I've Ever Seen — Then I Found Something Stranger или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
GPT-5.4 scored 95% on my planning benchmark — the highest I've ever recorded. But while I was testing it across every tool I use, a pattern showed up in the data that I genuinely did not expect. And it changes what I'd recommend. I ran GPT-5.4, Opus 4.6, Sonnet 4.6, and Gemini 3.1 Pro through Codex CLI, Claude Code, Gemini CLI, and Cursor — all on the same planning benchmark. This benchmark measures whether a model can take a real product requirements document and build a plan that doesn't drop features. It's not a coding test. It's a planning attention test. GPT-5.4 Extra High crushed it. But the bigger finding was what happened when I compared the same models across different tools — and what happened when I changed a single configuration in Claude Code. If you're evaluating AI coding tools or trying to decide between Cursor, Claude Code, Codex CLI, or Gemini CLI, this video shows real benchmark data across all of them. If you use Claude Code and rely on planning mode, there's a specific finding here that could change how you work. Whether you're an engineer optimizing your AI workflow or just trying to pick the right tool, this covers model performance, tool performance, and the surprising gap between them. The Benchmark if you want to try it: https://github.com/bladnman/planning_... #GPT54 #AICoding #Cursor #ClaudeCode #AIBenchmark 00:00 - Intro 00:31 - Marker 3 01:54 - GPT 5.4 results 06:53 - Things got interesting 07:06 - Cursor vs. CLI 09:12 - The Auto-Eval? 10:30 - Hot Take 12:03 - Closing