What happened with equity-for-compute deals reveal ai's physical wall?

Labs trade ownership for power and cables, as seen in Anthropic's $73B deals with Google and Amazon.

What happened with equity-for-compute deals reveal ai's physical wall?

Hardware shortages force reliance on three-year-old chips while criminal AI scales.

What happened with equity-for-compute deals reveal ai's physical wall?

Rack cable density now dictates model architecture, not algorithmic breakthroughs.

AI & TECH

Equity-for-compute deals reveal AI's physical wall

Friday, May 1, 2026 · from 3 podcasts

3 SOURCESMoonshots with Peter Diam…Dwarkesh Podcast The Economist

Moonshots with Peter Diam…Dwarkesh Podcast The Economist

Labs trade ownership for power and cables, as seen in Anthropic's $73B deals with Google and Amazon.
Hardware shortages force reliance on three-year-old chips while criminal AI scales.
Rack cable density now dictates model architecture, not algorithmic breakthroughs.

Anthropic traded future equity for $73 billion in cloud compute because the true currency is no longer cash, but physical infrastructure. On Moonshots, the analysts argued these deals secure survival, not just capital, as labs face a recursive bottleneck: you need chips to build models, and models to afford chips.

This supply crunch has become a throttle. The Intelligence reports that NVIDIA chips are sold out, forcing firms to use 2-3 year old hardware, while lead times for electrical transformers stretch to five years. Money is chasing a wall of land, water, and power.

"The supply crunch acts as a natural brake on the industry. Tech giants like Amazon and Microsoft are spending $700 billion on data centers this year, but money cannot buy land, water, or electricity where local opposition is mounting."
- The Intelligence from The Economist

The physical constraints are reshaping technical design. Reiner Pope explains on the Dwarkesh Podcast that high-speed Mixture of Experts models are limited by rack size. The all-to-all communication between experts works only within a single rack's dense NVLink network; crossing to another rack is eight times slower.

Cable density and bend radius, not compute theory, now cap the number of experts a model can use. Pope notes the leap from Nvidia’s Hopper to Blackwell was less about the chip and more about how many GPUs could fit within the same high-speed cable domain.

"The primary constraint on increasing rack size is physical: cable density, bend radius, weight, and cooling, not a fundamental technical barrier."
- Reiner Pope, Dwarkesh Podcast

The race is becoming vertically integrated. Google, which already controls an estimated 25% of global AI compute, is using its own AI to design its eighth-generation TPU chips. The goal is total silicon-to-software sovereignty, maximizing economic value per token from a owned stack.

While labs fight over physical real estate, automated crime scales alongside them. Criminal groups use AI-powered 'malware as a service' to steal biometrics and drain accounts, a $500 billion industry evolving faster than defenses. The frontier is splitting between those who control the physical layer and those left to rent it.

AI Infrastructure Big Tech Chips

Amazon Nvidia Google Microsoft

Source Intelligence

- Deep dive into what was said in the episodes

Moonshots with Peter Diamandis

Google Invests $40B Into Anthropic, GPT 5.5 Drops, and Google Cloud Dominates | EP #252 • Apr 30

Anthropic trades massive equity for infrastructure access as the training bottleneck shifts to power and fabs.
Frontier models are self-improving at a rate that renders human-led benchmarking nearly obsolete.
Google’s eighth-gen TPUs, designed by AI, signal a shift toward total silicon-to-software integration.

Models AI Infrastructure Agents Big Tech

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served • Apr 29

Reiner Pope explains that batch size is the key variable driving the trade-off between inference latency and cost. Batching amortizes the fixed cost of fetching model weights across many user requests.
Without batching, serving a large model is uneconomical. Pope states the cost can be a thousand times worse than when batching just two users together.
A roofline model for inference time combines compute time and memory fetch time. Compute time scales linearly with batch size, while memory time includes a constant for weights and a term linear in batch size for the KV cache.
There is a hard lower bound on inference latency set by the time needed to read all the model's total parameters from memory into the chips, which is independent of batch size.
Pope solves for the batch size where compute and memory times are balanced. The formula is batch size >= (Flops / Memory Bandwidth) * (Active Params / Total Params), where the hardware ratio Flops/Bandwidth is ~300.
This balance point implies the optimal batch size is approximately 300 times the model's sparsity ratio. For DeepSeek's sparsity of 32/256, this yields a batch size around 2000-3000 tokens.
In a scheduled system, a new inference 'train' departs every 20 milliseconds. Worst-case latency for a user is 40ms if they just miss a departure and must wait for the next train to complete.
The 20ms schedule is derived from the time to read the entire HBM capacity. For a Rubin-generation system with 288GB HBM and 20 TB/s bandwidth, this is about 15ms.
Pope argues increasing sparsity is a pure win for inference cost, as it reduces the active parameters and thus compute time. However, it demands larger batch sizes to amortize weight fetches and consumes more memory capacity.
Mixture-of-experts layers use expert parallelism, where different experts are placed on different GPUs. This creates an all-to-all communication pattern that is optimal within a single rack's high-bandwidth scale-up network.
Leaving the rack uses a scale-out network about eight times slower than the internal NVLink. This makes crossing rack boundaries for expert parallelism a severe bottleneck.
Pope states the primary constraint on increasing rack size is physical: cable density, bend radius, weight, and cooling, not a fundamental technical barrier.
Pipeline parallelism, which places different model layers on different racks, is viable for inference because the communication pattern is point-to-point rather than all-to-all, making scale-out latency manageable.
Pope argues the value of large scale-up domains like Google's or NVIDIA's Rubin is not primarily memory capacity, but memory bandwidth, which directly lowers inference latency and enables longer context lengths.
He presents a heuristic cost model for model development: total cost = pre-training cost + RL cost + inference cost. He conjectures labs roughly equalize these three costs.
Applying this model, Pope estimates frontier models are overtrained by a factor of about 100 relative to the compute-optimal Chinchilla scaling law, due to the need to amortize training compute over vast inference usage.
Pope reverse-engineers API pricing to deduce system bottlenecks. Gemini charging more for contexts over 200K tokens suggests a memory-to-compute crossover point near that length.
Output tokens being ~5x more expensive than input tokens indicates decode is memory-bandwidth bound, while pre-fill is compute-bound, as pre-fill amortizes memory costs over many tokens.

Also from this episode: (2)

Models (1)

Empirical research on mixture-of-experts shows model quality can increase with sparsity. An older paper found a 64-expert model with 270M active parameters matched the quality of a dense 1.3B parameter model.

AI & Tech (1)

Pipelining reduces the memory capacity needed per rack for model weights but does not reduce the memory needed for the KV cache, which becomes the dominant memory consumer.

AI Infrastructure Chips Models Big Tech

The Intelligence from The Economist

Power ranges: AI faces supply crunch • Apr 29

OpenAI shut down its Sora video generation tool to allocate scarce computing resources toward more lucrative ventures, reflecting an industry-wide AI compute shortage.
Weekly AI token processing on Open Router quadrupled from January to March 2024, illustrating surging AI demand that hardware cannot match.
Five major U.S. cloud providers, including Amazon, Meta, and Microsoft, will spend close to $700 billion on AI data center buildouts this year.
Data center construction faces local opposition over electricity, land, and water usage, causing project delays amid the urgent AI capacity push.
NVIDIA supplies over two-thirds of the world's AI processing power, but its chips are sold out, forcing companies to use older 2-3 year old hardware.
TSMC is the sole manufacturer for most advanced AI chips. Its capital expenditures are increasing by $60 billion this year, but capacity remains constrained.
Elon Musk's proposed 'TerraFab' aims to exceed all current chip fabrication capacity by 2030, a project analysts estimate would cost $5 to $13 trillion.
A prolonged AI supply crunch could reverse the trend of falling inference prices, leading to higher costs for users and potentially slowing AI adoption.

Also from this episode: (6)

AI & Tech (5)

A sophisticated spyware attack in Indonesia used a fake tax app to steal biometric data and drain over $26,000 from a charity accountant's bank accounts.
Criminal groups now operate a 'malware as a service' model, buying and selling stolen data and malicious software on platforms like Telegram to execute rapid, personalized attacks.
The global cybercrime industry is estimated to generate $500 billion annually, a scale comparable to the global illicit drug trade.
Security firm Infoblox identified a software cluster targeting victims in over 20 countries, with criminals integrating AI chatbots and deepfake tools to enhance attacks.
Allbirds is abandoning its footwear business, selling all shoe assets and rebranding as Newbird AI to pivot towards AI compute infrastructure.

Business (1)

Millennial-focused direct-to-consumer brands like Allbirds face pressure from rising interest rates, expensive online ad markets, and competition from larger, established companies.

AI Infrastructure Chips Macro Social Media Enterprise

Equity-for-compute deals reveal AI's physical wall

Source Intelligence

Related Stories