Price:

AI & TECH

AI coding shift ends flat-fee AI, demands new architect role

Tuesday, June 2, 2026 · from 4 podcasts, 5 episodes
  • The AI subsidy era is over: labs drop unlimited plans as agents burn compute, driving a shift to token-based revenue.
  • Software engineering pivots from manual coding to ‘vibe coding’ - architectural taste and direction are the new moats.
  • AI’s capabilities overhang is real: models can build apps in 40 minutes, but companies struggle to adopt and define what to build.

Six months into the agentic AI explosion, the industry is hitting its first financial and operational walls. The flat-fee subscription model, which powered early adoption, is collapsing under the weight of autonomous workflows. Power users on $200 monthly plans were burning $10,000 in compute, a subsidy Nathaniel Whittemore notes became unsustainable. The result is a rapid shift to per-token billing, with GitHub, Google, and Anthropic all imposing usage limits.

“The economic unit of AI has officially moved from the person to the token.”

- Nathaniel Whittemore, The AI Daily Brief

This economic shift coincides with a radical change in what it means to be a software engineer. The craft's traditional moat - the friction of specialized syntax - is gone. As Naval argues, models now understand “fuzzy, sloppy English,” rendering manual implementation a legacy skill. Engineer Max Hodak hasn't written a line of code in months but is building more than ever, simply brute-forcing problems by throwing multiple AI models at them iteratively.

The new bottleneck isn't typing, but thinking. Dax Raad, co-founder of OpenCode, warns that while AI makes shipping trivial, it turbocharges bad decisions, leading to Frankenstein products impossible to maintain. The most valuable engineers now combine “pre-AI principles and post-AI speed.” Their role is shifting to that of an architect: making high-stakes choices about system design and curating reusable blocks of infrastructure in what Guillermo Rauch calls a “block economy.”

“The human role is now ‘completing the model’ through architectural decisions... it still requires a human to decide between Postgres or Clickhouse.”

- Guillermo Rauch, Naval

The friction has moved from code to compute. GPU supply is the new throttle, bottlenecking even high-growth startups. Elon Musk’s pivot - providing SpaceX’s Colossus data centers to power Anthropic’s Claude - underscores the scramble for infrastructure. As labs like OpenAI and Anthropic launch enterprise consulting arms to bridge a massive “capabilities overhang,” the engineer's job is no longer to write the code, but to judge the output and steer the system that writes it.

Source Intelligence

- Deep dive into what was said in the episodes

The AI Token Shortage Begins [AI Monthly Recap]Jun 1

  • Nathaniel Whittemore argues the AI industry is experiencing its second major transitional moment of 2026, moving from an AI subsidy era to a token scarcity era defined by structural compute shortages driving up costs.
  • Foundation model company revenue shifted from seat-based subscriptions to API token consumption as the primary economic unit, fueling explosive growth. OpenAI surged to $30B ARR and Anthropic reached a $47B annualized run rate by mid-2026.
  • Uber burned through its entire 2026 AI budget in four months, and its COO later expressed skepticism about the value derived, sparking a broader 'AI sticker shock' conversation in corporate America.
  • In response to unsustainable costs, providers are shifting from flat-rate subscriptions to usage-based billing. GitHub Copilot, Google Gemini, and Anthropic all announced changes, with Anthropic's move to per-token billing for third-party tools causing significant user backlash.
  • Nathaniel Whittemore contends companies face a massive 'capabilities overhang' where agentic AI potential far outstrips organizational ability to adopt it, prompting OpenAI and Anthropic to launch major enterprise consulting initiatives.
  • AI infrastructure is 'going vertical'. Inference provider Base10 is raising $1B at an $11B valuation, and OpenRouter raised a $13M Series B, as companies build solutions for the costly, constrained compute environment.
  • Elon Musk pivoted from promoting Grok to partnering with Anthropic, providing access to SpaceX's Colossus 1 and Colossus 2 data centers to ease Anthropic's compute constraints, effectively turning SpaceX into a neocloud provider ahead of its IPO.
  • Wall Street is favoring AI infrastructure stocks, with memory companies like SK Hynix and Micron reaching trillion-dollar valuations. Meta is also considering becoming a cloud business to monetize its massive compute investments.
  • Model releases are becoming incremental, shifting focus to the harnesses and applications. Riley Brown and Greg Eisenberg both noted that updates to environments like Claude Code and Codex now matter more than modest model improvements.
Also from this episode: (4)

AI & Tech (4)

  • The token shortage is driving market-based innovation for cheaper inference. Cursor's Composer 2.5 offers lower cost than top models, while DeepSeek made a permanent 75% price cut on its V4 model to capture cost-conscious users.
  • Sam Altman and Dario Amodei softened their public narratives about AI's disruptive impact on jobs, with Altman citing new evidence he overestimated the transformation's speed, opening space for more nuanced policy discussions.
  • AI policy debates are fracturing, with some Democrats like Bernie Sanders and AOC calling for data center moratoriums, while Elizabeth Warren advocates for novel taxation structures like token taxes instead of blocking development.
  • The White House involved itself in model release governance, partly opposing wider access to Anthropic's Mythos model due to concerns over the token shortage and a desire to reserve compute for government use.

The Case for an AI Token TaxMay 28

  • Cloud Code's annualized revenue grew from $1 billion to $2.5 billion in just a couple of months during Q1 2026. Cloud Co-Work, launched in January, triggered emergency meetings at Microsoft.
  • Hyperscalers plan to spend $650 billion on capex in 2026, tripling their spending from a couple years prior and exceeding the inflation-adjusted cost of the US interstate highway buildout.
  • Cursor doubled its annualized revenue to $2 billion this quarter. Lovable reached $400 million ARR with a $100 million jump in one month. Replit projects $1 billion ARR by end of 2026.
  • Anthropic's share of first-time enterprise AI buyers jumped to 70%, with OpenAI at 25%. Anthropic hit a $19 billion run rate, closing the gap on OpenAI's approx. $25 billion.
  • Gartner bets 40% of enterprises will have working agents in production by end of 2026. Pulseia, a platform for building agentic companies, reached $6 million ARR with zero employees.
  • 71% of surveyed practitioners 'vibe coded' in the past month. 62% had automation or agentic use cases. The average respondent uses 3.5 different AI models.
  • The dominant AI value shifted from time savings to increased output and new capabilities. Time savings use cases dropped from 19.9% of surveyed use cases in January to 13.6% in February.
  • Anthropic research found an 80% capability gap in legal work, where AI could handle tasks but only 15% saw adoption. Finance firms reported low AI impact with 91% citing data quality as the biggest obstacle.
  • HR deployment of AI grew 320% in 12 months, from 19% to 61% adoption. Seven US states now have AI employment regulations.
  • The Pentagon designated Anthropic a supply chain risk after a dispute over Claude's use in military operations, leading to a lawsuit. OpenAI's subsequent deal with the Department of War triggered a 775% surge in one-star reviews for ChatGPT.
Also from this episode: (5)

AI & Tech (4)

  • Nathaniel Whittemore declares Q2 2026 as the onset of AI's 'second moment', shifting from viable AI assistants to workable agentic systems, with higher stakes across capabilities, economics, and corporate strategy.
  • Nine major frontier AI models shipped in Q1 2026, including GPT 5.2 Codex, Genie 3, Opus 4.6, GPT 5.3 Codex, and Sonnet 4.6. Benchmarks show constant jockeying with no single winner across all use cases.
  • Agent platform Open Claw became the most starred open-source project on GitHub ever. Nvidia CEO Jensen Huang called it potentially the most important software release ever.
  • The market for Generative Engine Optimization (GEO) was under $1 billion in 2025 but is projected to reach nearly $34 billion by 2034.

Business (1)

  • The 'SaaS Apocalypse' narrative took hold as investors feared AI was 'too good', leading to market carnage. Block cut 40% of its staff, cited as a portent for aggressive AI-era recalibration.

Google is Not a Serious CompanyMay 28

  • Theo says Google's 'Omni' model concept of anything-in, anything-out has one real use case: video-to-video generation for tasks like adding fire to a background, which he finds not broadly useful.
  • Theo argues most LLM-generated text is never read by humans but is still useful for reasoning and tool calls, while unviewed images and videos have no inherent value, questioning heavy investment in those modalities.
  • Theo states OpenAI's GPT-4o image generation is now useful for creating UI mockups and dashboards as business assets, not just artistic consumption, marking a shift in the modality's value.
  • Theo criticizes Google Cloud's reliability, citing a four-year history of issues and a recent incident where Google's algorithm mistakenly deleted Railway's entire account without human oversight.
  • Ben reveals a private software engineering benchmark showing GPT-4o and Claude 3.5 Opus leading, with a steep drop to Sonnet 3.6 and Gemini 1.5 Flash, and a final cliff to Gemini 1.0 Pro at 10% performance.
  • Theo ranks the AI lab hierarchy as OpenAI and Anthropic far ahead, with XAI and Cursor as potential contenders, followed by Chinese labs, and Google in last place due to stagnant trend lines.
  • Ben discusses Anthropic's monthly compute spend on SpaceX servers, revealed in the SpaceX IPO filing, as $1.25 billion, which constitutes a majority of Anthropic's estimated $1.5-2 billion monthly revenue.
  • Theo describes the Manis-Meta acquisition fallout, where Beijing used a policy to undo the completed $2B deal after employee onboarding, forcing Manis to try raising $1B to buy itself back from Meta.
  • Ben details Cursor's Composer 2.5 training techniques, including reverting implemented features to generate synthetic chat logs for RLHF and using a teacher-student method to correct tool-calling errors without explicit context.
  • Theo contends Google's core failure is bureaucratic fragmentation, contrasting it with OpenAI's model of individual experts moving between teams and Vercel's company-wide 'unblock me' Slack channel that treats internal blocks as P0 issues.
  • Ben introduces Lakebed, his integrated cloud framework built in four days with GPT-4o, designed to compile a full-stack app from code with three commands, eliminating the glue work between databases, auth, and hosting.
  • Ben argues the pain of deployment has become disproportionate now that AI can build apps in 40 minutes, making the traditional 3-5 hours of cloud configuration feel like an unacceptable bottleneck.
  • Ben states Lakebed automatically syncs environment variables from a local .env file to production on deploy, adding, updating, or deleting them as needed, which he calls the right approach for 90% of apps.
  • Theo describes a novel prompt injection attack vector called 'font hacking', where a PDF uses custom glyphs to show one city name to a human but a different name to an LLM reading the underlying text encoding.
Also from this episode: (4)

Big Tech (1)

  • Theo argues Google is not a serious company, pointing to a year-plus period of no notable frontier releases from its AI labs since Gemini 1.5 Flash, which he describes as a disaster.

Models (1)

  • Theo cites a survey showing Midjourney's user share dropping from 45% to 8%, illustrating the rapid churn in image generation tools as capabilities become commoditized within broader platforms.

AI & Tech (2)

  • Theo argues the Manis case and China's move to close-weight models like Qwen signify a deliberate decoupling from Western AI development, ending the era of Chinese open-weight models feeding the global ecosystem.
  • Ben asserts Google's models fail at reasoning, citing their tendency to get stuck in loops or berate themselves in traces, and posits that adding reasoning was the moment Gemini fell apart competitively.
The Pragmatic Engineer
The Pragmatic Engineer

The Pragmatic Engineer

Building OpenCode with Dax RaadMay 27

  • Dax Raad argues the core bottleneck for software teams has shifted from writing code to thinking about what to build. AI speeds execution but doesn't solve the problem of deciding what to do.
  • Raad's memo to his OpenCode team warned of AI turbocharging three classic problems: shipping features that aren't worth shipping, embedding hacky workarounds, and neglecting cleanup.
  • Raad believes companies with motivated, competitive employees will leverage AI productivity gains, but most engineers in standard environments will simply use the speed to do the same work with less energy.
  • Raad says GPU supply is bottlenecking even companies of OpenCode's size. Demand is growing exponentially while production is linear, causing a capacity crunch and forcing companies to hoard and pay upfront.
  • OpenCode's business model includes Zen, an inference service that hit a $50 million run rate within five or six months, and enterprise control plane software for managing AI tool usage at scale.
  • Raad emphasizes the importance of 'taste' and irrational quality investment. He cites building their own terminal framework as an irrational move that became a key differentiator against competitors like Cline.
  • Raad notes that old software patterns like Domain-Driven Design are becoming more useful again because they provide guardrails for 'a bunch of idiots' - AI agents that work 24/7.
  • Raad advises engineers to combine software skill with deep industry expertise. Spending a year in any field makes you more knowledgeable than 99% of people, creating a powerful 'unicorn' combination.
Also from this episode: (5)

AI & Tech (5)

  • Raad sees product-market fit as a critical phase where AI can worsen decision-making. He says it's easy to respond to every user request or competitor feature, which results in a Frankenstein product.
  • OpenCode's growth exploded from 650k monthly active users in December 2025 to 2.5 million in January 2026 and was around 6.5 million last month.
  • Raad asserts that pure inference businesses are extremely profitable due to high margins. He claims some models have sticker prices with 80% margins for OpenCode, and giants like Anthropic and OpenAI might see 90% margins.
  • Raad criticizes viral predictions like '24-29 year olds are the most valuable asset' as defense mechanisms. He says people confidently assert futures where they are winners to manage anxiety about rapid change.
  • OpenCode capitalized on Anthropic's clumsy ban of Claude subscriptions by galvanizing competitors. They secured official OpenAI support the next day, turning a crisis into a strategic win.

Waste Tokens, Save TimeMay 27

  • Gumo Roush argues engineering excellence is now about building multiplicative software factories, not delivering individual outputs.
  • Gumo claims 100x or 1000x engineers exist in intellectual domains, citing Satoshi, Notch, Brendan Eich, and John Carmack as examples.
  • Roush dismisses token consumption and lines of code as flawed productivity metrics, likening them to outdated management paradigms.
  • Roush's principle for AI use is to waste tokens to save time, arguing models remain cheaper than human labor regardless of quality.
  • He advocates brute forcing problems by throwing multiple AI models at them iteratively, trusting they will improve with each generation.
  • Naval questions if pure software engineering is dead, suggesting hardware founders gain advantage and model training may be the new software.
  • Roush argues reusable building blocks are critical for AI agents, citing Mitchell Hashimoto's 'block economy' concept for scalable cooperation.
  • Max Hodak built substantial software using AI since December, fulfilling long-held project fantasies without writing a single line of code.
  • Hodak states AI removes the intrinsic frustration of debugging, fundamentally changing the learning process for programming.
Also from this episode: (5)

AI & Tech (5)

  • Max Hodak observes AI model performance heavily depends on user judgment and feedback, especially the quality of reprompting.
  • Hodak resisted learning prompt tricks, assuming model improvement would outpace his skill acquisition and preferring a hamfisted approach.
  • Blake Shawl notes AI models now act as principal engineers by proposing architectural trade-offs, demanding more intellectual respect.
  • Shawl highlights architectural choices like database or messaging system selection as areas where human taste and judgment still dominate.
  • Naval predicts the human-as-instruction-giver phase is temporary, foreseeing AI agents directly interfacing with APIs and paying with crypto.