Price:

BUSINESS

David Sacks says AI race demands US wins or loses global market share

Tuesday, May 5, 2026 · from 6 podcasts
  • Zero slack compute forces tech giants into multi-year, prepaid power deals at double the spot rate, morphing them into leveraged industrial utilities.
  • The U.S. must embrace 'permissionless innovation' to maintain dominance as China controls half the global developer pool, according to David Sacks.
  • Real-time AI inference is hitting physical limits of cable density and memory bandwidth, not just chip shortages.

The trillion-dollar AI buildout is slamming into a wall of transformers, plumbers, and copper cables. While software updates ship in weeks, physical infrastructure takes years, creating a supply choke that is reshaping the tech industry’s fundamental economics. According to David Sacks on The Conversation with Dasha Burns, the U.S. faces an 'infinite game' for global market share where slowing domestic innovation merely offshore's progress to adversaries.

Hyperscalers are responding by becoming energy companies. Microsoft, Google, and Amazon have announced combined capital expenditures nearing $800 billion for 2026, crushing free cash flow. As Chamath Palihapitiya noted on All-In, they are locking in long-term power purchase agreements at rates more than double the spot price just to guarantee supply. The asset-light software era is over; these firms are becoming highly leveraged, capital-intensive industrial utilities.

The compute crunch is absolute. Baseten CEO Tuhin Srivastava reports his clusters run at mid-90s utilization with 'zero slack,' forcing three-to-five-year contract lock-ins with 30% upfront cash. This isn't a temporary shortage but a structural shift where access to GPUs is now a strategic barrier. The scarcity is so severe that even with this spending, firms are throttling service - Anthropic’s Opus 4.7 is currently 'compute-gated,' pushing users to older models.

“If global data centers run on Huawei chips and DeepSeek models five years from now, the U.S. loses its primary lever of soft power.”

- David Sacks, The Conversation with Dasha Burns

Supply constraints are exposing physical bottlenecks far beyond silicon. Reiner Pope explained on the Dwarkesh Podcast that a key limit for scaling massive Mixture-of-Experts models is the physical space for copper cables within a rack, dictating how many GPUs can communicate at high speed. Meanwhile, the immediate economic impact is inflationary, not deflationary. Steve Hou argued on Forward Guidance that the massive infrastructure spending competes for scarce tradespeople, boosting wages for electricians and plumbers by 25-30% while providing zero slack to the Fed.

Sacks advocates a federal regulatory framework focused on specific harms like child safety, pre-empting a patchwork of 1,200 state-level AI bills he sees as knee-jerk governance. His proposed 'ratepayer protection pledge' would ease permitting for data centers if companies bring their own power infrastructure, turning AI firms into energy providers that sell excess back to the grid. This is a pragmatic recognition that the buildout's fate hinges on local politics and physical logistics.

“We are currently over-provisioning on memory capacity just to get the necessary bandwidth. We have a surplus of space but a deficit of speed.”

- Reiner Pope, Dwarkesh Podcast

The competition extends beyond infrastructure. Sacks cited a stark optimism gap: 83% of Chinese respondents believe AI will be more beneficial than harmful, compared to under 40% of Americans. He views this cultural divergence as a bigger threat to U.S. leadership than any single technical setback. The race isn't just to build bigger models, but to define the global ecosystem in which they operate.

The path forward is through specialization, not sheer scale. Srivastava noted that over 95% of Baseten’s inference traffic uses custom-tuned models, not raw open-source weights. Companies like Abridge survive not by outspending frontier labs, but by owning deep user workflows - like clinician interactions - that generate proprietary training signals. This suggests the ultimate winners will control unique data loops, not just massive compute.

Source Intelligence

- Deep dive into what was said in the episodes

The Conversation with Dasha Burns
The Conversation with Dasha Burns

The Conversation with Dasha Burns

David Sacks: Can AI solve the problems it creates?May 2

  • He advocates a 'permissionless innovation' regulatory framework with minimal burdens to keep the U.S. ahead. Sacks says innovation originates in the private sector and the government's role should be encouraging.
  • He identifies specific areas for state-level regulation: online child safety, data center impacts on electricity rates, and creator protections. His 'north star' for child safety is parental empowerment over app usage.
  • The administration supports a 'ratepayer protection pledge' where AI companies building new data centers agree not to increase residential electricity prices, with the quid pro quo being easier permitting if they bring their own power.
  • Sacks is skeptical of holding AI developers broadly liable for end-user actions, comparing it to holding Gmail or Excel responsible for crimes committed using their services. He says it's hard for developers to know all use cases.
  • On the Anthropic-Pentagon dispute, Sacks believes it was unrealistic for the company to demand a veto over lawful military uses after deciding to sell to the Department of War. He says concerns about surveillance loopholes should be addressed by changing laws, not terms of service.
Also from this episode: (10)

AI & Tech (10)

  • Sacks argues the U.S. must win a global AI race against competitors like China to protect national security and the economy, framing it as an 'infinite game' without a finish line.
  • Sacks cites Trump's AI policy pillars: pro-innovation, pro-energy infrastructure to power data centers, and pro-export to gain global market share for American chips and models.
  • He disagrees with Elon Musk's more pessimistic view of AI as an existential threat. Sacks believes the biggest dystopian risk is government using AI for surveillance and control, not a Terminator-like scenario.
  • Sacks views the AI-enhanced cybersecurity arms race as one AI will solve. He argues tools like Anthropic's Mythos will help defenders find and patch vulnerabilities before hackers exploit them, reaching a new equilibrium.
  • He points to a Stanford study showing a stark optimism gap: 83% of Chinese respondents believe AI will be more beneficial than harmful, compared to under 40% of Americans. Sacks calls this the biggest threat to U.S. leadership.
  • Sacks says current data does not support widespread AI-driven job loss. He cites a Yale Budget Lab study finding no discernible labor market disruption in the three years after ChatGPT's launch and the Challenger Gray report attributing less than 5% of 2023 layoffs to AI.
  • He highlights an AI-driven construction boom, with $650 billion in data center capex this year acting as a 2% GDP tailwind and boosting blue-collar wages for electricians and plumbers by 25-30%.
  • Sacks argues AI won't eliminate coding jobs but will shift them toward prompting and supervising models. He notes demand for software engineers rose 10% year-over-year even as AI coding tools proliferated.
  • He claims Anthropic's enterprise revenue from coding tools scaled from about $10 billion to $30 billion between January and March 2024, calling the growth unprecedented.
  • Sacks criticizes well-funded 'doomer' groups and super PACs that want to halt AI progress, alleging they have astroturfed NIMBY backlash against data centers and influenced media discourse.

OpenAI Misses Targets, Codex vs Claude, Elon vs Sam Trial, Big Hyperscaler Beats, Peptide CrazeMay 1

  • OpenAI missed its target of reaching one billion weekly active users for ChatGPT by the end of 2025 and is still short of that milestone in 2026. It also missed its 2025 revenue target.
  • OpenAI has $600 billion in spending commitments for compute, a figure roughly equal to its secondary market valuation. CFO Sarah Friar is reportedly concerned about revenue growth keeping pace with these expenses.
  • David Sacks argues OpenAI's recent product release, GPT 5.5, has received strong reviews from developers and is taking coding market share from Anthropic's Opus 4.7, which users complain is compute-constrained and buggy.
  • Polymarket shows a 32% chance OpenAI goes public by the end of 2026, down from 60% in December, reflecting market skepticism about its IPO timeline.
  • Chamath Palihapitiya contends the fundamental constraint for AI companies is power supply, not demand, and that supply chain delays for grid infrastructure will hurt OpenAI and Anthropic while benefiting hyperscalers like Oracle, Amazon, Meta, Microsoft, and Google.
  • David Friedberg cites a BCG 'Rule of Three' theory, predicting mature AI markets will evolve to a 4:2:1 market share ratio between the top three players, with OpenAI and Google currently leading in consumer and Google leading in enterprise.
  • Friedberg highlights an MIT paper on neural network pruning, showing models can be reduced in size by 90% for the same accuracy, potentially cutting inference costs by 10x and dramatically improving energy efficiency.
  • In the Elon Musk vs. OpenAI trial, excerpts from Greg Brockman's diary suggest the OpenAI team wanted to transition to a for-profit structure and remove Musk as an investor, which Musk claims breaches charitable trust.
  • Polymarket odds give Elon Musk a 42-43% chance of winning his lawsuit against OpenAI, with speculation that a win might only result in him being credited back his initial $40 million investment.
  • Major hyperscalers announced massive 2026 capital expenditure guidance: Amazon at $200 billion, Microsoft and Google at $190 billion each, and Meta at $145 billion, signaling a trillion-dollar infrastructure buildout driven by AI and cloud demand.
  • Friedberg attended the Supreme Court hearing for Monsanto (Bayer) vs. Hardeman, a case testing federal preemption of pesticide labels under FIFRA against state failure-to-warn laws, with Bayer having paid out $10 billion and reserving another $10 billion for 90,000 outstanding cases.
Also from this episode: (5)

AI & Tech (2)

  • Sacks notes OpenAI released GPT 5.5 Cyber, which matches Anthropic's Mythos in completing multi-step cyber attack simulations, and is likely the first such model commercially available due to OpenAI's superior compute capacity.
  • Sacks argues AI cyber models like Mythos or GPT 5.5 Cyber don't create vulnerabilities but discover existing bugs, and their proliferation will lead to a one-time upgrade cycle as systems are hardened, followed by a new equilibrium between offense and defense.

Business (2)

  • This CAPEX surge is crushing free cash flow: Amazon's is down 97%, while Google, Microsoft, and Meta are down 12%, 12%, and 8% respectively, as companies pivot from shareholder returns to heavy infrastructure investment.
  • Chamath Palihapitiya predicts hyperscalers will become highly leveraged, debt-heavy industrial businesses within five years as they lock in long-term power contracts at rates more than 2x the prevailing spot price.

Science (1)

  • David Friedberg cites phase three trial data for Eli Lilly's Retatrutide, a triple agonist peptide, showing an average weight loss of 37 pounds in 40 weeks, an 80% reduction in liver fat, and significant drops in A1C, non-HDL cholesterol, and triglycerides.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference CloudMay 1

  • The AI compute market faces a severe supply crunch with very little slack compute, forcing Baseten to run large clusters at mid-90s utilization across 18 clouds globally.
  • Tuhin Srivastava explains that securing 1,024 B-200 GPUs today demands a three to five-year contract with a 20-30% total contract value prepay, highlighting capital requirements for capacity acquisition.
  • Baseten maintains high customer stickiness, achieving 400% annual Net Dollar Retention (NDR) due to its comprehensive software layer, which differentiates it from non-sticky 'GPUs as a service'.
  • Tuhin Srivastava acknowledges NVIDIA's strong supply chain, CUDA ecosystem, and developer support make them difficult to surpass in the short term, despite the desirability of a multi-chip world.
Also from this episode: (11)

AI & Tech (11)

  • Alad notes that Baseten has grown 30x in the last year and expects to exceed $1 billion in revenue, reflecting the rapid expansion of the AI inference market.
  • Tuhin Srivastava attributes the growth to mainstream adoption of open-source models, which have crossed a capability chasm, and widespread use of post-training techniques for specialized models.
  • Tuhin Srivastava believes an independent application layer will exist because companies leverage unique user signals encoded in workflows, making it difficult for frontier model companies to replicate.
  • Abridge, an ambient scribe used by physicians, exemplifies an application layer company with deep integration into hospital workflows and access to unique user signal for post-training models.
  • The majority of the AI market today, approximately 99% by inference count, represents enterprise adoption that is yet to come online, indicating significant future growth potential.
  • Tuhin Srivastava states that over 95% of tokens served on Baseten are from custom models where customers modify open-source models with their own data or for performance.
  • Baseten acquired a research team specializing in post-training to accelerate market support and integrate post-training expertise with inference, recognizing their interconnectedness.
  • Tuhin Srivastava believes Chinese open-source models are fantastic with no real evidence of embedded agendas, but emphasizes the U.S. needs to develop its own competitive models.
  • Running models like DeepSeek can be 20% of the cost of proprietary alternatives, offering comparable latency and reliability, making access to such intelligence crucial for national innovation.
  • Jevons Paradox applies to AI inference: decreasing the cost of intelligence leads developers to embed more intelligence into applications, driving greater consumption and better user experiences.
  • Tuhin Srivastava envisions a future where AI provides personalized concierge services for everyone, making everything smarter and leading to the creation of even more software.

The AI Bubble Is Widely Misunderstood | Steve HouApr 29

  • Agentic AI, where models call themselves, changes the compute demand picture completely. Hou estimates this could increase demand by a hundredfold or more, depending on deployment.
  • Hou notes Korean and Taiwanese economies are booming due to exports of chips and memory for the AI buildout.
  • Non-residential construction payrolls are recovering, singularly driven by data center builds, offsetting declines in residential construction.
  • The core US debt arithmetic problem is that tax receipts are a stable 17-20% of GDP, while spending and interest costs rise. Growing the GDP denominator is the primary political option left.
Also from this episode: (9)

AI & Tech (8)

  • Steve Hou argues the AI investment cycle was inevitable due to epistemic uncertainty, creating a bubble from the start. The question is its size, duration, and current stage, not its existence.
  • Hou distinguishes the AI bubble from the dotcom bubble because AI tools were widely adopted immediately. In the dotcom era, significant unused capacity was built out before being filled.
  • Hou believes non-coders underestimated the recent AI acceleration because they don't understand the complex, code-centric questions that drive agentic AI demand.
  • AI's primary GDP impact so far is from the buildout investment, not productivity gains. This investment has cushioned the US economy post-2022 rate hikes.
  • Hou is skeptical of current high productivity readings reflecting AI gains. He attributes them to compositional bias and labor market adjustments post-COVID overhiring.
  • He argues clean causal evidence of AI boosting labor productivity is not yet visible in aggregate data, but that doesn't mean it isn't happening. Anecdotes of efficiency gains are likely valid.
  • Hou highlights Baumol's cost disease as a key challenge. Inflation is driven by labor-intensive services like childcare and plumbing, sectors where AI's productivity impact will be slowest.
  • He predicts AI will fundamentally reshape economics through richer modeling and agentic simulations for policy evaluation. It will also democratize advanced econometric tools for researchers.

Fed (1)

  • Hou is highly skeptical of preemptive Fed rate cuts based on anticipated AI-driven disinflation. He says the direct inflationary impact of the AI buildout, competing for scarce resources, is more immediate.

Reiner Pope – The math behind how LLMs are trained and servedApr 29

  • Reiner Pope explains that batch size is the key variable driving the trade-off between inference latency and cost. Batching amortizes the fixed cost of fetching model weights across many user requests.
  • Without batching, serving a large model is uneconomical. Pope states the cost can be a thousand times worse than when batching just two users together.
  • A roofline model for inference time combines compute time and memory fetch time. Compute time scales linearly with batch size, while memory time includes a constant for weights and a term linear in batch size for the KV cache.
  • There is a hard lower bound on inference latency set by the time needed to read all the model's total parameters from memory into the chips, which is independent of batch size.
  • Pope solves for the batch size where compute and memory times are balanced. The formula is batch size >= (Flops / Memory Bandwidth) * (Active Params / Total Params), where the hardware ratio Flops/Bandwidth is ~300.
  • This balance point implies the optimal batch size is approximately 300 times the model's sparsity ratio. For DeepSeek's sparsity of 32/256, this yields a batch size around 2000-3000 tokens.
  • In a scheduled system, a new inference 'train' departs every 20 milliseconds. Worst-case latency for a user is 40ms if they just miss a departure and must wait for the next train to complete.
  • The 20ms schedule is derived from the time to read the entire HBM capacity. For a Rubin-generation system with 288GB HBM and 20 TB/s bandwidth, this is about 15ms.
  • Pope argues increasing sparsity is a pure win for inference cost, as it reduces the active parameters and thus compute time. However, it demands larger batch sizes to amortize weight fetches and consumes more memory capacity.
  • Mixture-of-experts layers use expert parallelism, where different experts are placed on different GPUs. This creates an all-to-all communication pattern that is optimal within a single rack's high-bandwidth scale-up network.
  • Leaving the rack uses a scale-out network about eight times slower than the internal NVLink. This makes crossing rack boundaries for expert parallelism a severe bottleneck.
  • Pope states the primary constraint on increasing rack size is physical: cable density, bend radius, weight, and cooling, not a fundamental technical barrier.
  • Pipeline parallelism, which places different model layers on different racks, is viable for inference because the communication pattern is point-to-point rather than all-to-all, making scale-out latency manageable.
  • Pope argues the value of large scale-up domains like Google's or NVIDIA's Rubin is not primarily memory capacity, but memory bandwidth, which directly lowers inference latency and enables longer context lengths.
  • He presents a heuristic cost model for model development: total cost = pre-training cost + RL cost + inference cost. He conjectures labs roughly equalize these three costs.
  • Applying this model, Pope estimates frontier models are overtrained by a factor of about 100 relative to the compute-optimal Chinchilla scaling law, due to the need to amortize training compute over vast inference usage.
  • Pope reverse-engineers API pricing to deduce system bottlenecks. Gemini charging more for contexts over 200K tokens suggests a memory-to-compute crossover point near that length.
  • Output tokens being ~5x more expensive than input tokens indicates decode is memory-bandwidth bound, while pre-fill is compute-bound, as pre-fill amortizes memory costs over many tokens.
Also from this episode: (2)

Models (1)

  • Empirical research on mixture-of-experts shows model quality can increase with sparsity. An older paper found a 64-expert model with 270M active parameters matched the quality of a dense 1.3B parameter model.

AI & Tech (1)

  • Pipelining reduces the memory capacity needed per rack for model weights but does not reduce the memory needed for the KV cache, which becomes the dominant memory consumer.

Power ranges: AI faces supply crunchApr 29

  • OpenAI shut down its Sora video generation tool to allocate scarce computing resources toward more lucrative ventures, reflecting an industry-wide AI compute shortage.
  • Weekly AI token processing on Open Router quadrupled from January to March 2024, illustrating surging AI demand that hardware cannot match.
  • Five major U.S. cloud providers, including Amazon, Meta, and Microsoft, will spend close to $700 billion on AI data center buildouts this year.
  • Data center construction faces local opposition over electricity, land, and water usage, causing project delays amid the urgent AI capacity push.
  • NVIDIA supplies over two-thirds of the world's AI processing power, but its chips are sold out, forcing companies to use older 2-3 year old hardware.
  • TSMC is the sole manufacturer for most advanced AI chips. Its capital expenditures are increasing by $60 billion this year, but capacity remains constrained.
  • Elon Musk's proposed 'TerraFab' aims to exceed all current chip fabrication capacity by 2030, a project analysts estimate would cost $5 to $13 trillion.
  • A prolonged AI supply crunch could reverse the trend of falling inference prices, leading to higher costs for users and potentially slowing AI adoption.
Also from this episode: (6)

AI & Tech (5)

  • A sophisticated spyware attack in Indonesia used a fake tax app to steal biometric data and drain over $26,000 from a charity accountant's bank accounts.
  • Criminal groups now operate a 'malware as a service' model, buying and selling stolen data and malicious software on platforms like Telegram to execute rapid, personalized attacks.
  • The global cybercrime industry is estimated to generate $500 billion annually, a scale comparable to the global illicit drug trade.
  • Security firm Infoblox identified a software cluster targeting victims in over 20 countries, with criminals integrating AI chatbots and deepfake tools to enhance attacks.
  • Allbirds is abandoning its footwear business, selling all shoe assets and rebranding as Newbird AI to pivot towards AI compute infrastructure.

Business (1)

  • Millennial-focused direct-to-consumer brands like Allbirds face pressure from rising interest rates, expensive online ad markets, and competition from larger, established companies.