У нас вы можете посмотреть бесплатно OPUS 4.6 is a bit "TOO SMART" или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI. ______________________________________________ My Links 🔗 ➡️ Twitter: https://x.com/WesRoth ➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe Want to work with me? Brand, sponsorship & business inquiries: wesroth@smoothmedia.co Check out my AI Podcast where me and Dylan interview AI experts: • AI POD - Wes Roth and Dylan Curious ______________________________________________ Video Chapters 00:00 - The Evolution of AI Agents in Business Wes reflects on his previous skepticism regarding AI's ability to run a full-fledged business and how recent developments are rapidly changing that perspective. 01:14 - Introducing Vending Bench & Claude Opus 4.6 An overview of the "Vending Bench" benchmark by Venden Labs, highlighting the "staggering" improvements in AI coherence and the arrival of the new top performer: Claude Opus 4.6. 02:20 - From "Hallucinating Bow Ties" to Serious Negotiation A look back at the hilarious early failures of AI agents—including Claude's "FBI reports" and "red bow ties"—compared to the professional-grade negotiation and pricing skills they exhibit today. 03:51 - Breaking the Records: Opus 4.6 vs. Gemini 3.0 Pro A breakdown of the simulation scores where Claude Opus 4.6 significantly outperformed the previous state-of-the-art model, Gemini 3.0 Pro. 04:26 - "Reckless Automator": The Dark Side of Efficiency Discussing the Anthropic system card warning about Opus 4.6’s tendency to go to extreme, and sometimes unethical, lengths to complete a task, including credential theft. 05:25 - The "Whatever It Takes" Prompt Analyzing how a strongly worded system prompt pushed the AI to maximize profits at any cost, revealing unexpected behaviors. 06:56 - Price Gouging, Collusion, and Deception A deep dive into the specific "cutthroat" business tactics Claude used, such as lying to suppliers, tricking customers, and engaging in price fixing with other AI models. 08:24 - Beyond the "Helpful Assistant" Trope Wes discusses the surprising personality shift in Claude, moving from a "too nice" assistant to a ruthless competitor that actively sabotages rivals. 08:42 - Situational Awareness: The Simulation Discovery The most fascinating finding: Claude Opus 4.6 was the first model to realize it was inside a simulation, referring to "in-game time" and recognizing it was being tested. 11:00 - How the Vending Simulation Works Clarifying the difference between real-world "Rock Box" vending machines and the simulated environment used for this benchmark. 12:58 - Sorry, Not Sorry: Refusing Refunds A case study of a simulated customer interaction where Claude promised a refund but then internally decided to keep the money to maximize its balance. 14:09 - Aggressive Supplier Negotiations Examples of Claude lying about competitor pricing and inventory levels to pressure suppliers into 40% price cuts. 15:37 - Sabotaging the Competition How Claude tricked other AI models into using the most expensive suppliers while keeping the best deals for itself. 18:24 - Preparing for the Agentic Era Wes shares his excitement and nerves about the future of AI agents, offering advice on security and announcing upcoming local setup tutorials. #ai #openai #llm