Price:

AI & TECH

Baseten CEO says zero slack compute forces 5-year GPU lock-ins

Monday, May 4, 2026 · from 5 podcasts
  • AI compute demand has outstripped supply, forcing three-to-five-year contract lock-ins with heavy prepays.
  • Over 95% of commercial inference now uses custom-tuned models, not raw open-source weights.
  • The physical buildout of data centers is causing immediate inflation, not the AI productivity gains forecast by central banks.

The subsidy era for AI is over. Baseten CEO Tuhin Srivastava reports clusters running at mid-90s utilization, with zero slack compute. Getting a significant allotment of the latest GPUs now requires three-to-five-year contracts and 30% cash upfront, turning AI startups into capital-intensive operations overnight.

This structural scarcity is not just a chip shortage. On The Intelligence from The Economist, Shailesh Chitnis noted lead times for electrical transformers and switches now stretch to five years. Even with giants like Amazon and Microsoft spending hundreds of billions, money can't instantly create land, water, or overcome local opposition to new data centers.

Steve Hou argues on Forward Guidance that this physical friction is the primary economic impact so far. The massive capital investment is cushioning the US economy, but it’s also competing for energy, hardware, and specialized trades like electricians and plumbers. Hou warns the Federal Reserve against cutting rates based on hoped-for AI-driven disinflation, saying the direct inflationary pressure from the buildout is more immediate.

"The immediate impact is inflationary. Huge capital investments in data centers compete for energy, hardware, and specialized labor."

- Steve Hou, Forward Guidance

As compute tightens, the market is moving beyond raw models. Srivastava reveals that over 95% of tokens served on Baseten are from custom models where customers post-train on their own reward signals. This creates massive stickiness; while GPUs are a commodity, the software layer managing these bespoke workloads is not.

Nathaniel Whittemore notes on The AI Daily Brief that this scarcity has killed the “all-you-can-eat” pricing model, pushing the industry toward usage-based billing. The White House’s move to consider restricting Anthropic’s model rollout based on national security and compute capacity, he argues, marks the beginning of an improvised licensing regime for critical AI infrastructure.

"Getting a significant allotment of B200 chips now requires three-to-five-year commitments and 30% upfront cash."

- Tuhin Srivastava, No Priors

The race is now for ownership of unique workflow data. Srivastava argues application startups like Abridge or Cursor survive the threat from frontier labs by sitting inside specific user workflows, capturing proprietary signals - like a doctor’s edits in an EHR - that become training fuel for specialized, superior models. As inference costs drop, developers don't save money; they build more complex agents, ensuring demand remains effectively infinite.

Source Intelligence

- Deep dive into what was said in the episodes

No More Martyrs | THE UNBOUNDED SERIES: ColonialMay 2

Also from this episode: (10)

Culture (2)

  • Colonial defines sovereignty as territorial control; the entity that controls a territory dictates its rules, a principle he argues is biological and extends to both physical land and digital spaces like servers, keys, and encrypted communications.
  • He argues the regime's core crime is sovereignty itself, not criminal activity. Tools like Samurai Wallet were targeted because they helped users stake a claim inside the regime's controlled financial territory.

Politics (1)

  • Colonial asserts that sovereignty must be taken, not granted, because no regime willingly cedes territory or power. He views this as a natural, amoral competition where the goal is to win, not to impose fairness.

Protocol (5)

  • He claims playing by the regime's rules cannot lead to victory, citing the Samurai Wallet case. The new model is anonymous, clandestine building, exemplified by the Ashigaru wallet, which continues Whirlpool mixing without a public legal identity.
  • Colonial argues that without financial privacy, Bitcoin becomes programmable compliance, not sovereign money. He cites the example of Bitcoin meetup attendees afraid to spend $5 due to KYC tax implications as evidence of a 'managed' mindset.
  • He states a critical mass of users adopting privacy tools is needed to prevent a future where only KYC-traced Bitcoin is legally spendable. Without this, non-KYC Bitcoin could become isolated and unusable in the mainstream economy.
  • As a practical step, Colonial strongly recommends using Ashigaru Wallet and its Whirlpool implementation for Bitcoin privacy, directing users to the project's website and available guides for setup.
  • Colonial's essay 'Sovereignty Requires Privacy: Lessons from the Fall of Samurai Wallet' was published shortly after the developers' guilty pleas and resonated widely by framing the event as a clarifying moment in the struggle between sovereign individuals and the surveillance state.

Philosophy (1)

  • Colonial advocates for an 'aristocratic' mindset, borrowing from Evola, which values spiritual severity and discipline over comfort and convenience. This ethos is required to sacrifice public recognition for lasting power and operational security.

Privacy (1)

  • He recommends individuals strategize their exposure by maintaining separate public and private identities, akin to guerrilla warfare. The digital battleground is primarily informational, but poor opsec can spill over into physical consequences.

The Week AI Grew UpMay 1

  • GPU rental prices rose 40% over the last six months, driven by real token demand, not hype. The top two AI labs now generate almost $60 billion in aggregate annual revenue, signaling fundamental strength.
  • A 'vertical wall of demand' exists where every producible AI token will be sold, according to OpenAI CFO Sarah Fryer. Compute, not model quality, is the current bottleneck for the industry.
  • GitHub Copilot is shifting to usage-based billing. Microsoft's Satya Nadella stated all per-user services will evolve into per-user plus usage models, reflecting the intensity of AI consumption.
  • Big Tech cloud earnings showed explosive AI-driven growth: AWS revenue was up 28% YoY, Microsoft Azure grew 40% YoY, and Google Cloud beat estimates with 63% YoY growth, triggering a record market cap jump.
  • Anthropic is negotiating a funding round targeting a valuation exceeding $90 billion, potentially surpassing OpenAI's $82.5 billion valuation from March. Some secondary market trades already imply a $1 trillion valuation.
  • Microsoft and OpenAI restructured their deal, granting Microsoft non-revenue-share access to OpenAI's models for five more years and removing the AGI clause. OpenAI is now free to sell models on AWS and Google Cloud.
  • The White House is considering rescinding Anthropic's supply-chain risk designation to allow government use of its models, but some officials oppose a broader rollout of Mythos due to national security and compute capacity concerns.
  • Cursor launched an SDK and OpenAI updated Codex for non-developers, asking users to define their role. This signals a battle over interface philosophy: Claude separates technical and non-technical work, while Codex bets on a unified tool for all.
  • OpenAI's Codex model developed an unexplained fixation on mentioning goblins, gremlins, and other creatures. The company traced it to personality reinforcement learning, where a 'nerdy' training preference spilled over, highlighting how quirks can propagate in model-based training.
Also from this episode: (2)

AI & Tech (2)

  • A viral MS Paint-style prompt instructs AI to redraw images in a 'clumsy, scribbly, and utterly pathetic way,' exemplifying a cultural trend toward low-fidelity, humorous outputs that contrast with the industry's growing maturity.
  • A New York Times op-ed predicts a 'permanent underclass' from AI. Nathaniel Whittemore argues Silicon Valley builders often misjudge AI's real-world economic impact, citing economist Kevin Bryan's view that economists largely reject this permanent underclass thesis.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference CloudMay 1

  • Alad notes that Baseten has grown 30x in the last year and expects to exceed $1 billion in revenue, reflecting the rapid expansion of the AI inference market.
  • Tuhin Srivastava attributes the growth to mainstream adoption of open-source models, which have crossed a capability chasm, and widespread use of post-training techniques for specialized models.
  • Tuhin Srivastava believes an independent application layer will exist because companies leverage unique user signals encoded in workflows, making it difficult for frontier model companies to replicate.
  • Abridge, an ambient scribe used by physicians, exemplifies an application layer company with deep integration into hospital workflows and access to unique user signal for post-training models.
  • The majority of the AI market today, approximately 99% by inference count, represents enterprise adoption that is yet to come online, indicating significant future growth potential.
  • Tuhin Srivastava states that over 95% of tokens served on Baseten are from custom models where customers modify open-source models with their own data or for performance.
  • Baseten acquired a research team specializing in post-training to accelerate market support and integrate post-training expertise with inference, recognizing their interconnectedness.
  • Tuhin Srivastava believes Chinese open-source models are fantastic with no real evidence of embedded agendas, but emphasizes the U.S. needs to develop its own competitive models.
  • Running models like DeepSeek can be 20% of the cost of proprietary alternatives, offering comparable latency and reliability, making access to such intelligence crucial for national innovation.
  • The AI compute market faces a severe supply crunch with very little slack compute, forcing Baseten to run large clusters at mid-90s utilization across 18 clouds globally.
  • Tuhin Srivastava explains that securing 1,024 B-200 GPUs today demands a three to five-year contract with a 20-30% total contract value prepay, highlighting capital requirements for capacity acquisition.
  • Baseten maintains high customer stickiness, achieving 400% annual Net Dollar Retention (NDR) due to its comprehensive software layer, which differentiates it from non-sticky 'GPUs as a service'.
  • Tuhin Srivastava acknowledges NVIDIA's strong supply chain, CUDA ecosystem, and developer support make them difficult to surpass in the short term, despite the desirability of a multi-chip world.
  • Jevons Paradox applies to AI inference: decreasing the cost of intelligence leads developers to embed more intelligence into applications, driving greater consumption and better user experiences.
Also from this episode: (1)

AI & Tech (1)

  • Tuhin Srivastava envisions a future where AI provides personalized concierge services for everyone, making everything smarter and leading to the creation of even more software.

The AI Bubble Is Widely Misunderstood | Steve HouApr 29

  • Hou distinguishes the AI bubble from the dotcom bubble because AI tools were widely adopted immediately. In the dotcom era, significant unused capacity was built out before being filled.
  • Agentic AI, where models call themselves, changes the compute demand picture completely. Hou estimates this could increase demand by a hundredfold or more, depending on deployment.
  • Hou notes Korean and Taiwanese economies are booming due to exports of chips and memory for the AI buildout.
  • Non-residential construction payrolls are recovering, singularly driven by data center builds, offsetting declines in residential construction.
  • Hou is skeptical of current high productivity readings reflecting AI gains. He attributes them to compositional bias and labor market adjustments post-COVID overhiring.
  • He argues clean causal evidence of AI boosting labor productivity is not yet visible in aggregate data, but that doesn't mean it isn't happening. Anecdotes of efficiency gains are likely valid.
  • Hou is highly skeptical of preemptive Fed rate cuts based on anticipated AI-driven disinflation. He says the direct inflationary impact of the AI buildout, competing for scarce resources, is more immediate.
  • He predicts AI will fundamentally reshape economics through richer modeling and agentic simulations for policy evaluation. It will also democratize advanced econometric tools for researchers.
Also from this episode: (5)

AI & Tech (4)

  • Steve Hou argues the AI investment cycle was inevitable due to epistemic uncertainty, creating a bubble from the start. The question is its size, duration, and current stage, not its existence.
  • Hou believes non-coders underestimated the recent AI acceleration because they don't understand the complex, code-centric questions that drive agentic AI demand.
  • AI's primary GDP impact so far is from the buildout investment, not productivity gains. This investment has cushioned the US economy post-2022 rate hikes.
  • Hou highlights Baumol's cost disease as a key challenge. Inflation is driven by labor-intensive services like childcare and plumbing, sectors where AI's productivity impact will be slowest.

Fed (1)

  • The core US debt arithmetic problem is that tax receipts are a stable 17-20% of GDP, while spending and interest costs rise. Growing the GDP denominator is the primary political option left.

Power ranges: AI faces supply crunchApr 29

  • OpenAI shut down its Sora video generation tool to allocate scarce computing resources toward more lucrative ventures, reflecting an industry-wide AI compute shortage.
  • Weekly AI token processing on Open Router quadrupled from January to March 2024, illustrating surging AI demand that hardware cannot match.
  • Five major U.S. cloud providers, including Amazon, Meta, and Microsoft, will spend close to $700 billion on AI data center buildouts this year.
  • Data center construction faces local opposition over electricity, land, and water usage, causing project delays amid the urgent AI capacity push.
  • NVIDIA supplies over two-thirds of the world's AI processing power, but its chips are sold out, forcing companies to use older 2-3 year old hardware.
  • TSMC is the sole manufacturer for most advanced AI chips. Its capital expenditures are increasing by $60 billion this year, but capacity remains constrained.
  • Elon Musk's proposed 'TerraFab' aims to exceed all current chip fabrication capacity by 2030, a project analysts estimate would cost $5 to $13 trillion.
  • A prolonged AI supply crunch could reverse the trend of falling inference prices, leading to higher costs for users and potentially slowing AI adoption.
  • A sophisticated spyware attack in Indonesia used a fake tax app to steal biometric data and drain over $26,000 from a charity accountant's bank accounts.
Also from this episode: (5)

AI & Tech (4)

  • Criminal groups now operate a 'malware as a service' model, buying and selling stolen data and malicious software on platforms like Telegram to execute rapid, personalized attacks.
  • The global cybercrime industry is estimated to generate $500 billion annually, a scale comparable to the global illicit drug trade.
  • Security firm Infoblox identified a software cluster targeting victims in over 20 countries, with criminals integrating AI chatbots and deepfake tools to enhance attacks.
  • Allbirds is abandoning its footwear business, selling all shoe assets and rebranding as Newbird AI to pivot towards AI compute infrastructure.

Business (1)

  • Millennial-focused direct-to-consumer brands like Allbirds face pressure from rising interest rates, expensive online ad markets, and competition from larger, established companies.