The subsidy era for AI is over. Baseten CEO Tuhin Srivastava reports clusters running at mid-90s utilization, with zero slack compute. Getting a significant allotment of the latest GPUs now requires three-to-five-year contracts and 30% cash upfront, turning AI startups into capital-intensive operations overnight.
This structural scarcity is not just a chip shortage. On The Intelligence from The Economist, Shailesh Chitnis noted lead times for electrical transformers and switches now stretch to five years. Even with giants like Amazon and Microsoft spending hundreds of billions, money can't instantly create land, water, or overcome local opposition to new data centers.
Steve Hou argues on Forward Guidance that this physical friction is the primary economic impact so far. The massive capital investment is cushioning the US economy, but it’s also competing for energy, hardware, and specialized trades like electricians and plumbers. Hou warns the Federal Reserve against cutting rates based on hoped-for AI-driven disinflation, saying the direct inflationary pressure from the buildout is more immediate.
"The immediate impact is inflationary. Huge capital investments in data centers compete for energy, hardware, and specialized labor."
- Steve Hou, Forward Guidance
As compute tightens, the market is moving beyond raw models. Srivastava reveals that over 95% of tokens served on Baseten are from custom models where customers post-train on their own reward signals. This creates massive stickiness; while GPUs are a commodity, the software layer managing these bespoke workloads is not.
Nathaniel Whittemore notes on The AI Daily Brief that this scarcity has killed the “all-you-can-eat” pricing model, pushing the industry toward usage-based billing. The White House’s move to consider restricting Anthropic’s model rollout based on national security and compute capacity, he argues, marks the beginning of an improvised licensing regime for critical AI infrastructure.
"Getting a significant allotment of B200 chips now requires three-to-five-year commitments and 30% upfront cash."
- Tuhin Srivastava, No Priors
The race is now for ownership of unique workflow data. Srivastava argues application startups like Abridge or Cursor survive the threat from frontier labs by sitting inside specific user workflows, capturing proprietary signals - like a doctor’s edits in an EHR - that become training fuel for specialized, superior models. As inference costs drop, developers don't save money; they build more complex agents, ensuring demand remains effectively infinite.




