У нас вы можете посмотреть бесплатно SlopCodeBench: Measuring Code Erosion as Agents Iterate или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
SlopCodeBench: Evaluating Code Quality Erosion in AI Coding Agents AI coding assistants can solve problems—but are they writing maintainable code? In this interview, Gabriel Orlanski, PhD student at UW-Madison and lead author of SlopCodeBench, discusses a critical gap in how we evaluate coding agents: what happens to code quality over time. Current benchmarks like SWE-bench measure whether agents can solve isolated tasks. But real software development is iterative. Features get added, code gets patched, and architectural decisions compound. That's where SlopCodeBench comes in—the first benchmark designed to measure code quality erosion across multiple development checkpoints. What You'll Learn 🔍 *The "Slop" Problem* - Why AI-generated code often feels verbose, poorly structured, and hard to maintain—even when it works 📊 *Multi-Checkpoint Evaluation* - How SlopCodeBench simulates real iterative development instead of single-shot tasks 🤖 *Surprising Model Behaviors* - Selective amnesia, library aversion, deletion phobia, and the complexity spiral ⚙️ *Real-World Impact* - Why this matters for teams deploying AI coding assistants and managing technical debt 🛠️ *The Path Forward* - What model builders and agent designers should focus on to improve code quality Timestamps [00:00:00] Introduction and Gabe's journey into ML for code [00:02:52] The frustration with "slop" and what's missing in current benchmarks [00:07:57] Design philosophy: Why hand-written problems matter [00:10:22] SlopCodeBench and technical debt [00:15:13] Benchmaxing: How to spot when models are over-optimized [00:16:30] Where advanced models struggle most [00:19:00] Recommendations for model builders and agent designers [00:21:49] Gabe's approach to AI coding tools (skills and subagents) [00:23:54] Building the SlopCodeBench community [00:25:45] Fred's perspective on evaluation research [00:26:56] Contributing to the benchmark Key Insights 💡 Models frequently ignore code they've already written and try to reimplement from scratch—especially in "high thinking" mode 💡 AI agents are allergic to using libraries, preferring to hand-roll implementations even for common tasks 💡 Models refuse to delete unnecessary code, leading to bloat and complexity accumulation 💡 The erosion compounds at every step as agents take the "least resistance approach" by patching instead of refactoring Resources 🌐 **SlopCodeBench Website**: https://scbench.ai 📖 **Design Philosophy & Contributing Guide**: Available on the website 💬 **Discord Community**: Join via scbench.ai to contribute problems 🔗 **GitHub Repository**: https://github.com/SprocketLab/slop-c... About the Speakers *Gabriel Orlanski* is a PhD student at the University of Wisconsin-Madison researching ML for code, with a focus on evaluation and benchmarking. He previously interned at Replit working on end-to-end coding agents. *Fred Sala* is Chief Scientist at Snorkel AI, and Assistant Professor at UW-Madison and Gabe's advisor, specializing in data-centric AI and evaluation methodologies. *Kobie Crawford* is a Developer Advocate at Snorkel AI, focusing on AI evaluation and benchmarking. Get Involved SlopCodeBench is designed for community contribution! If you're opinionated about code quality and want to help build better coding agents, check out the contributing guidelines on scbench.ai. The team is looking for developers to write new multi-checkpoint problems that test architectural decision-making. --- 💬 **Join the conversation**: What's your experience with AI-generated code quality? Have you noticed these patterns? Share in the comments! 👍 If you found this interview valuable, please like and subscribe for more content on AI evaluation and coding agents. #AIcoding #SoftwareEngineering #MachineLearning #CodeQuality #AIevaluation #SlopCodeBench #CodingAgents #TechnicalDebt