У нас вы можете посмотреть бесплатно Is Grok 4 OVERHYPED? Deep Dive + Comparisons или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Elon Musk and xAI claim that Grok 4 and Grok 4 Heavy are the smartest AIs on Earth. I dropped $300 on the SuperGrok plan to find out if it can beat ChatGPT Pro at the LLM game. Benchmarks like Humanity's Last Exam and ARC-AGI-2 say that Grok 4 is the best, too. I don't trust the benchmarks anymore. It's not that they're untrustworthy. But for about a year now, they haven't reflected the reality of AI power users. They're too easily gamed and they don't ask the sort of questions we do. Let's put Grok 4 and o3, and their more intense variations, Grok 4 Heavy and o3-pro, to the ultimate test. It's Elon versus OpenAI, as always. Timestamps 00:00 Grok 4 and AI Benchmarks 00:18 Latest ranking of AI models 05:27 First comparison (product comparison) 08:18 Second comparison (politics) 16:08 Third comparison (web search) 17:30 Fourth comparison (analysis and writing) 21:42 Fifth comparison (research and synthesis) 24:35 The final results! 25:58 What's with Grok's weird behavior? 27:37 The advantage of X (bonus comparison) 30:34 Final takeaways We tested the model on practical applications that show us fundamental capability, including: Web search and tool use: how well do these models retrieve facts, gather disparate data, and use outside knowledge, even for something as straightforward as looking for a particular type of tech startup? Analysis and synthesis: which model does better at handling complex topics like a mayoral candidate's tax proposal for NYC? How well does the model put together information after it does research? Research and detail-orientation: is o3 or Grok 4 better at heavy analysis of technical products, looking up purchasing prices across multiple retailers, and not missing specification details about a newly released AI model called K2? Writing and creativity: how well do each of the models do at understanding core concepts and turning them into readable writing, with a Substack or Twitter audience in mind? Why real tasks? Because benchmarks are gameable. Power users judge LLMs on messy, unstructured work that mirrors real life. By the way... if you're interested in taking your own AI muscle to the next level, consider joining my AI Fluency Bootcamp. It's a live, cohort-based course where we skip the BS of prompt templates and n8n workflows, and in just 10 days, get at a deep and expert foundational understanding of how large language models actually work, the nuance behind the future of artificial intelligence & agents, and practical sprints to help you 10x your AI outcomes. More details here: https://aimuscle.com/fluency Here's what a recent student had to say: "This course is superb and Sherveen's enthusiasm for the subject is infectious. He provides so much high-level material that at first, it seemed almost impossible to grasp. Little by little, though, I realized I was getting the important concepts - and there are a lot. Sherveen is always accessible, no question is too 'dumb,' and he always encouraged everyone to participate. I can’t recommend this course enough." Key takeaways: On the actual performance of these two models, or compared to Google's Gemini 2.5 Pro or Anthropic's Claude Sonnet & Opus 4, the top-tier champion is still OpenAI's latest in the form of o3 and o3-pro. True generalized intelligence can reveal itself in just a few prompts. Benchmarks have their place, but if you're trying to make a choice about what AI to use or pay for, you have to find the vibes. If you're going to use these products, start to understand how they work! The better your inputs, your understanding of a model's task-specific strengths, etc. – the better your ROI in this new AI era. Want to get updates, including livestreams of more AI research and testing, walkthroughs of "vibe coding" a real world product, and a newsletter about the latest? Sign up here: https://aimuscle.com/ Links to the raw chat outputs: Computer mouse recommendation: o3: https://chatgpt.com/share/68783261-05... Grok 4: https://grok.com/share/bGVnYWN5_73fc0... NYC mayoral tax plan: o3-pro: https://chatgpt.com/share/686fd392-d8... Grok 4 Heavy: https://grok.com/share/bGVnYWN5_1431d... Find me a $10M startup: o3: https://chatgpt.com/share/687832d8-ad... Grok 4: https://grok.com/share/bGVnYWN5_80dae... Write a newsletter: o3-pro: https://chatgpt.com/share/687832ed-87... Grok 4 Heavy: https://grok.com/share/bGVnYWN5_f8710... Waymo versus Tesla: o3-pro: https://chatgpt.com/share/6878330f-5a... Grok 4 Heavy: https://grok.com/share/bGVnYWN5_b1b86... Tell me about Kimi's K2 agentic AI: o3: https://chatgpt.com/share/68783336-bf... Grok 4: https://grok.com/share/bGVnYWN5_bf6bf...