Price:

AI & TECH

Zechner warns AI agents create codebase slop faster than teams

Monday, June 15, 2026 · from 4 podcasts
  • AI coding agents generate overwhelming technical debt, forcing complete rewrites.
  • Human architects must define boundaries for agents, which lack structural reasoning.
  • Students protest AI automating entry-level jobs, breaking the white-collar career ladder.

Mario Zechner built the minimalist agent Pi after watching generative AI wreck projects. A team of human engineers might make a mess, but Zechner argues a swarm of agents can generate a million lines of incoherent 'slop' in months, necessitating a total rewrite.

The core failure is recall. Zechner notes that when an agent can't find the right context in a sprawling codebase, it hallucinates new abstractions, compounding errors in a recursive loop. His solution is a hard divide: humans define system boundaries, APIs, and critical logic; the agent merely fills in implementation.

"A team of humans can mess up a project, but 100 agents working for three months will generate enough 'slop' to necessitate a total rewrite."

- Mario Zechner, The Modern Software Developer

Jeffrey Cannell of Nous Research sees this power shift moving beyond coding. He reports reaching 'functional AGI' where AI is as good as the best humans on specific tasks, leading him to abandon manual coding entirely. The panel on This Week in AI agreed Claude Opus 4.5 was the inflection point.

This automation is triggering a backlash. Cannell observes students booing AI at graduation ceremonies, recognizing that the entry-level roles they studied for are being deleted before they start. The traditional apprenticeship model is collapsing.

"Students are booing AI at commencement because the career ladder lost its bottom rungs."

- Jeffrey Cannell, This Week in AI

Brilliant CEO Sue Kim is betting on a different path. Her AI tutor Koji uses Socratic methods to force cognitive struggle, not bypass it. She built the system on seven years of deterministic logic, arguing frontier LLMs have plateaued because they lack a verifiable reward signal for actual learning.

Nathaniel Whittemore notes the agentic leap is accelerating. He reports Stripe used Anthropic's new Fable 5 model to compress months of work on a 50-million-line Ruby codebase into a day. The required skill is now 'task imagination' - defining responsibilities an autonomous loop can handle for hours without oversight.

The era of all-you-can-eat subscriptions is ending. Anthropic is moving Fable 5 to strict pay-per-token pricing, forcing users to classify tasks and reserve the 'jet engine' for complex migrations. The frontier is becoming a scarce, expensive utility.

Source Intelligence

- Deep dive into what was said in the episodes

The Modern Software Developer
The Modern Software Developer

The Modern Software Developer

Pi Building Pi, Openclaw's Minimalist Coding Agent | Mario Zechner, Creator of PiJun 14

  • Mario Zechner argues current models lack sufficient RLHF data on software architecture and design, making them ineffective at structuring solutions.
  • Zechner uses agents on modular, well-architected code where boundaries are clear, but reserves final oversight for mission-critical and security-related components.
  • Zechner built Pi, a minimalist coding agent harness based on a small, extensible core that users can modify themselves to fit workflows, opposing heavy feature-driven designs.
  • Zechner avoids MCP integrations in Pi, citing issues with server implementations wasting context tokens on tool definitions and preferring direct CLI use.
  • Zechner's workflow for bug fixes includes using Pi with an issue prompt template to fetch, label, and analyze GitHub issues, verifying the analysis before implementing.
  • Zechner manually reviews agent-generated code to combat unnecessary abstraction and complexity, using a custom Pi extension to provide inline feedback.
  • Zechner's agents.md file defines coding style and rules, but notes models often ignore it, relying more on deterministic linting and type-checking for enforcement.
  • Zechner says agents can massively degrade a codebase faster than human teams, requiring ruthless refactoring, but believes they can also assist in that cleanup.
  • Zechner uses GPT-5.5 as his daily driver for code but switches to Claude for prose, and dabbles with open-weight models like Kimi 2.6 and DeepSeek.
  • Zechner avoids automatic worktree creation in Pi, citing distrust of models handling complex git operations and relying on modular code to prevent file conflicts.
  • Zechner refactors large codebases by first using the agent to explore and summarize relevant files, then carrying that summary into a separate implementation branch within the session.
  • Zechner built a robot with a Pi brain over 12 hours, using voice-to-text and agent-generated frontend code, then refactored the messy result by modularizing tool implementations.
  • Zechner advocates adversarial agent roles to push back on user ideas and prevent sloppy code, referencing Matt Shumer's 'roast me' skill as an example.

Fable 5 Raises the Bar for AI AmbitionJun 10

  • Anthropic launched Claude Fable 5, its first 'Mythos-class' model, which Nathaniel Whittemore describes as 'fairly undisputedly the best AI model we have ever been able to use'.
  • Fable 5 significantly outperformed competitors on key benchmarks. On Swebench Pro it scored 80.3% versus GPT-55's 58.6%, and it achieved a 29.3% on the new Frontier Code benchmark, more than double Opus 48's 13.4%.
  • Mythos 5, the less-safeguarded counterpart to Fable 5, is initially only available to Project Glasswing partners, including the US government, with plans for a broader trusted access program later.
  • Anthropic implemented strict content guardrails on Fable 5, automatically routing requests related to cybersecurity, biology, chemistry, or 'distillation' (AI research) to Claude Opus 48 instead of refusing them outright.
  • Early adopters reported transformative use cases, including Stripe using Fable 5 to compress months of engineering into days for a 50-million-line Ruby codebase migration, and Allie K. Miller noting it could solve MBA-level word math problems with zero babysitting.
  • API pricing for Fable 5 is set at $10 per million input tokens and $50 per million output tokens, double the cost of Opus but less than half the cost of the Mythos Preview within Project Glasswing.
  • Anthropic's data retention policy for Mythos-class models mandates that prompts and outputs are retained for 30 days for trust and safety purposes, a move criticized for creating enterprise compliance challenges.
  • Felix Ryeberg of Anthropic argued Fable 5 signals a shift from users giving AI 'tasks' to assigning 'responsibilities' or autonomous loops, such as having an agent monitor all crash reports instead of just fixing a single bug.
  • Nate B. Jones described the critical new skill for the Fable 5 era as 'task imagination' - the ability to conceive of ambitious, multi-hour projects to delegate, moving beyond small, incremental AI tasks.
  • Whittemore predicts users will need to develop 'use case classification' skills to optimize token efficiency, consciously matching different tasks to the appropriate model power level as high-end models like Fable 5 move to usage-based pricing.

Hermes Agent, NotebookLM & LiveKit Founders on the AI Agent Race | TWiAI 17Jun 10

  • Jeffrey Cannell reports Hermes Agent is now ranked number one on Open Router and recently launched a desktop app, marking rapid growth over the last three months.
  • Steven Johnson explains Notebook LM's foundation is a source-grounded AI experience, providing state-of-the-art citations and audio overviews, with its most significant update integrating its separate research, creation, and source-analysis agents into a single chat agent.
  • Russ D'Sa reveals LiveKit powers voice AI for high-profile clients including Spotify, Tesla's support and service centers, Grok Voice, Salesforce's Agent Force, and SAP's Joule.
  • Steven Johnson contrasts Harvard Law's mandatory use of Notebook LM for a constitutional law class with Berkeley Law's restrictive AI policy that only permits AI for finding sources.
  • Jeffrey Cannell argues AI agents will automate much entry-level work, creating a disconnect between college preparation and a tightening job market.
  • Steven Johnson advocates using AI as a world-class tutor and editor to amplify cognitive processes rather than bypass learning, a framework he believes would make AI skills valuable in any future job market.
  • Panelists critique Apple's new Siri AI for a persistent user experience problem where users don't know its capabilities, making it slower than using a browser, and for lacking a conversational, human-like interaction flow.
  • Steven Johnson is optimistic about Apple's standalone Siri app as a potential new AI application paradigm, citing Apple's history with breakthrough apps like GarageBand and HyperCard.
  • Jeffrey Cannell suggests Apple may have avoided training frontier models because the costs are prohibitive and a fourth player was unnecessary, instead partnering with Google and investing in open-source via their MLX platform for Apple Silicon.
  • Russ D'Sa predicts the ultimate winners in AI will be platforms that transcend specific devices for digital work automation and companies focused on embodied AI robots for physical chore automation, not device-centric players like Apple.
  • Jeffrey Cannell describes reaching 'functional AGI' where on specific tasks, AI is as good as the best humans, citing his own transition from writing code manually to using AI for all coding work.
  • Panelists agree Claude Opus 4.5 was the inflection point where AI coding models crossed a threshold to become better than human developers, leading to a phase of rapid, reliable agentic automation.
  • Jeffrey Cannell identifies corporate 'token maxing' as a failure case where employees use unlimited AI budgets inefficiently, while high-performers can be worth 10x the token spend, a value hard to assess at large scale.
  • Russ D'Sa notes his top engineers spend up to $10k-$15k monthly on AI tokens, which he considers a high-value investment that turns them into vastly more productive workers.
  • Jeffrey Cannell states current smaller local models lack the quality for coding agents compared to frontier models, and the scaling trajectory points to ever-larger models, making local high-performance compute a niche.

The AI Tutor That Makes Kids Actually Think | E2298Jun 8

  • Sue Kim says Brilliant teaches problem solving over procedural knowledge, a more transferable skill than memorizing formulas. She says school math often fails when students encounter unfamiliar problems.
  • Brilliant’s new AI tutor Cooji launched last week and went viral with nearly 5 million views on X. Kim says the success shows consumer demand for AI that makes you think, not AI that replaces thinking.
  • Brilliant’s AI tutor Cooji is Socratic, uses interactive canvases LLMs can read and write to, and gradually removes visual scaffolding as students reach mastery. The core pedagogy and mathematical correctness are deterministic systems built over seven years.
  • Sue Kim says Brilliant’s pricing is benchmarked against human tutors, not casual apps. The goal is a product that does 95% of a tutor's job for 30 dollars a month, a fraction of the typical 10,000 dollar annual tutoring cost.
  • Sue Kim says Brilliant chose a direct-to-consumer model over B2B sales to schools to stay close to learner feedback. They read every app store review and customer email for real-time product development insights.
  • Sue Kim says the ability of frontier LLMs to tutor well has plateaued since GPT-3.5 because they lack verifiable reward signals for learning outcomes. Brilliant's unique dataset of tutoring sessions provides that signal for model improvement.
  • Sue Kim says Brilliant’s vision is a world-class tutor in every home for every subject and language. They are expanding from math and coding into science and younger age groups, leveraging LLMs for high-quality localization.
Also from this episode: (5)

Business (3)

  • Jason Calacanis says startup founders should ignore traditional TAM analysis for novel ideas, citing Airbnb and eBay as companies that induced entirely new markets. He says bad VC behavior often stems from an inability to assess non-existent markets.
  • Jason Calacanis explains his firm's process to improve founder feedback scores. He mandates that every first meeting ends with the investor repeating the founder's vision back to them to ensure understanding.
  • Jason Calacanis tells a story of a VC firm canceling a meeting while he was driving to it after a cross-country flight. He confronted the investor, calling him the worst venture capitalist of all time.

Startups (2)

  • Jason Calacanis recounts a story where John Doerr attended a pitch meeting directly from the emergency room after a biking accident, viewing it as a sign of ultimate commitment despite Doerr being groggy.
  • Sue Kim says 40% of Brilliant’s users are in the US, with 60% international. This drove the choice of the name Cooji, which is short, globally accessible, and not tied to a specific language.