Price:

AI & TECH

AI compute crunch stalls model progress, triggers layoffs

Friday, May 1, 2026 · from 5 podcasts
  • Tech giants are liquidating thousands of jobs to fund multi-billion dollar AI chip and data center deals.
  • Physical hardware limits - cable density, power, and memory bandwidth - now dictate AI model architecture.
  • The supply crunch acts as a natural brake, forcing firms to throttle services and use outdated hardware.

The AI boom is hitting a physical wall. While software iterates in weeks, building the fabs, data centers, and power infrastructure to run it takes years. This isn't just a chip shortage. Lead times for electrical transformers now stretch to five years.

According to Shailesh Chitnis on The Intelligence, Nvidia chips are so scarce that companies are resorting to hardware that is three years old. The industry is in a state of 'throttling,' with Anthropic changing service terms to dissuade peak usage and OpenAI reportedly prioritizing compute for lucrative ventures over new tools.

"The demand for AI tokens is growing exponentially, but the physical world is hitting a wall."

- Shailesh Chitnis, The Intelligence

The bottleneck is structural. Reiner Pope explained on the Dwarkesh Podcast that low-latency AI is inherently inefficient because the cost of fetching model weights from memory can't be amortized without batching thousands of users. The physical design of server racks - dictated by cable density and bend radius - now limits how many GPUs can communicate at high speed, constraining model architecture.

Faced with these limits, capital is being forcibly reallocated. Jason Calacanis outlined the calculus on This Week in Startups: Meta framed recent layoffs as a way to offset the cost of massive AI infrastructure investments. It's a literal trade of payroll for compute power, a pattern repeating across profitable tech firms.

"Meta’s recent layoffs were framed by Chief People Officer Janelle Gail as a way to offset the cost of AI investments. The company is liquidating human headcount to pay for multi-billion dollar deals for chips."

- Jason Calacanis, This Week in Startups

This creates a recursive economic risk. Breaking Points highlighted that layoffs at Meta, Oracle, and Microsoft are using AI as a pretext to shed labor and trigger stock bumps, decoupling executive wealth from a hollowing labor market. If the AI promise fails, there's no safety net.

The crunch acts as a natural brake on progress. With hyperscalers spending $700 billion on data centers this year, the ultimate limits may be land, water, and local opposition - problems money can't immediately solve.

Source Intelligence

- Deep dive into what was said in the episodes

Reiner Pope – The math behind how LLMs are trained and servedApr 29

  • Reiner Pope explains that batch size is the key variable driving the trade-off between inference latency and cost. Batching amortizes the fixed cost of fetching model weights across many user requests.
  • Without batching, serving a large model is uneconomical. Pope states the cost can be a thousand times worse than when batching just two users together.
  • A roofline model for inference time combines compute time and memory fetch time. Compute time scales linearly with batch size, while memory time includes a constant for weights and a term linear in batch size for the KV cache.
  • There is a hard lower bound on inference latency set by the time needed to read all the model's total parameters from memory into the chips, which is independent of batch size.
  • Pope solves for the batch size where compute and memory times are balanced. The formula is batch size >= (Flops / Memory Bandwidth) * (Active Params / Total Params), where the hardware ratio Flops/Bandwidth is ~300.
  • This balance point implies the optimal batch size is approximately 300 times the model's sparsity ratio. For DeepSeek's sparsity of 32/256, this yields a batch size around 2000-3000 tokens.
  • In a scheduled system, a new inference 'train' departs every 20 milliseconds. Worst-case latency for a user is 40ms if they just miss a departure and must wait for the next train to complete.
  • The 20ms schedule is derived from the time to read the entire HBM capacity. For a Rubin-generation system with 288GB HBM and 20 TB/s bandwidth, this is about 15ms.
  • Pope argues increasing sparsity is a pure win for inference cost, as it reduces the active parameters and thus compute time. However, it demands larger batch sizes to amortize weight fetches and consumes more memory capacity.
  • Mixture-of-experts layers use expert parallelism, where different experts are placed on different GPUs. This creates an all-to-all communication pattern that is optimal within a single rack's high-bandwidth scale-up network.
  • Leaving the rack uses a scale-out network about eight times slower than the internal NVLink. This makes crossing rack boundaries for expert parallelism a severe bottleneck.
  • Pope states the primary constraint on increasing rack size is physical: cable density, bend radius, weight, and cooling, not a fundamental technical barrier.
  • Pipeline parallelism, which places different model layers on different racks, is viable for inference because the communication pattern is point-to-point rather than all-to-all, making scale-out latency manageable.
  • Pope argues the value of large scale-up domains like Google's or NVIDIA's Rubin is not primarily memory capacity, but memory bandwidth, which directly lowers inference latency and enables longer context lengths.
  • He presents a heuristic cost model for model development: total cost = pre-training cost + RL cost + inference cost. He conjectures labs roughly equalize these three costs.
  • Applying this model, Pope estimates frontier models are overtrained by a factor of about 100 relative to the compute-optimal Chinchilla scaling law, due to the need to amortize training compute over vast inference usage.
  • Pope reverse-engineers API pricing to deduce system bottlenecks. Gemini charging more for contexts over 200K tokens suggests a memory-to-compute crossover point near that length.
  • Output tokens being ~5x more expensive than input tokens indicates decode is memory-bandwidth bound, while pre-fill is compute-bound, as pre-fill amortizes memory costs over many tokens.
Also from this episode: (2)

Models (1)

  • Empirical research on mixture-of-experts shows model quality can increase with sparsity. An older paper found a 64-expert model with 270M active parameters matched the quality of a dense 1.3B parameter model.

AI & Tech (1)

  • Pipelining reduces the memory capacity needed per rack for model weights but does not reduce the memory needed for the KV cache, which becomes the dominant memory consumer.

Power ranges: AI faces supply crunchApr 29

  • OpenAI shut down its Sora video generation tool to allocate scarce computing resources toward more lucrative ventures, reflecting an industry-wide AI compute shortage.
  • Weekly AI token processing on Open Router quadrupled from January to March 2024, illustrating surging AI demand that hardware cannot match.
  • Five major U.S. cloud providers, including Amazon, Meta, and Microsoft, will spend close to $700 billion on AI data center buildouts this year.
  • Data center construction faces local opposition over electricity, land, and water usage, causing project delays amid the urgent AI capacity push.
  • NVIDIA supplies over two-thirds of the world's AI processing power, but its chips are sold out, forcing companies to use older 2-3 year old hardware.
  • TSMC is the sole manufacturer for most advanced AI chips. Its capital expenditures are increasing by $60 billion this year, but capacity remains constrained.
  • Elon Musk's proposed 'TerraFab' aims to exceed all current chip fabrication capacity by 2030, a project analysts estimate would cost $5 to $13 trillion.
  • A prolonged AI supply crunch could reverse the trend of falling inference prices, leading to higher costs for users and potentially slowing AI adoption.
Also from this episode: (6)

AI & Tech (5)

  • A sophisticated spyware attack in Indonesia used a fake tax app to steal biometric data and drain over $26,000 from a charity accountant's bank accounts.
  • Criminal groups now operate a 'malware as a service' model, buying and selling stolen data and malicious software on platforms like Telegram to execute rapid, personalized attacks.
  • The global cybercrime industry is estimated to generate $500 billion annually, a scale comparable to the global illicit drug trade.
  • Security firm Infoblox identified a software cluster targeting victims in over 20 countries, with criminals integrating AI chatbots and deepfake tools to enhance attacks.
  • Allbirds is abandoning its footwear business, selling all shoe assets and rebranding as Newbird AI to pivot towards AI compute infrastructure.

Business (1)

  • Millennial-focused direct-to-consumer brands like Allbirds face pressure from rising interest rates, expensive online ad markets, and competition from larger, established companies.

The Defense Tech Startup YC Kicked Out of a Meeting is Now Arming America | E2280Apr 25

  • The increasing power of hardware like Macs and Dell's GB300/3000 workstations will enable startups to develop local, open-source AI models trained on proprietary data.
  • KPMG, Meta, and Nike recently announced layoffs affecting 8,000-10,000 employees in total. Janelle Gail, Meta's Chief People Officer, stated these layoffs help offset significant investments in AI infrastructure, like a multi-billion dollar deal with Amazon for Graviton chips.
  • Maruchcci Kim developed Vuebuds, earbuds equipped with cameras that use low power and process AI models on a connected device, aiming to integrate visual AI into commonly worn technology.
  • Kim, who previously worked at Apple for 5 years on AirPods, envisions Vuebuds as a platform for OEMs, licensing software and providing reference hardware, rather than competing directly with established audio brands.
  • Vuebuds offer advantages over smart glasses, which faced cultural resistance, by integrating visual AI into a discreet device already worn by over a billion people. The camera module costs only $1-2.
  • Maruchcci Kim suggests a "wearable AI app store" could enable developers to create niche, impactful applications for Vuebuds. The current Vuebuds stream a monochrome 324x239 image to conserve power, as Wi-Fi draws too much energy.
Also from this episode: (9)

Startups (3)

  • Jason Calacanis identifies a 24-month window for startups to achieve AI relevance, predicting the emergence of multi-deca-billion dollar companies. He plans to focus on Small Language Models (SLMs) and vertical SLMs (VSLMs) for specific functions.
  • Jason Calacanis suggests that business opportunities representing less than 10% (ideally under 1-5%) of a large company's revenue are often seen as distractions, creating prime opportunities for startups, including non-venture-backed ventures.
  • Will Edwards, founder of Firehawk Aerospace, builds solid rocket motors (SRMs) for defense using 3D-printed propellant, a venture started about five years ago (circa 2019-2020) despite initial disinterest from investors like Y Combinator.

Media (1)

  • Calacanis Media Empire launched "This Week in AI," a new podcast with 10 episodes, releasing the first half of each show on the "This Week in Startups" feed.

Safety (1)

  • Lon Harris notes widespread debate about "P-doom" (probability of AI doomsday), with estimations ranging from 20% to 80% among experts, though Jason Calacanis views such concerns as hyperbolic.

Enterprise (1)

  • Firehawk's method uses an energetic pellet feedstock, reducing propellant manufacturing time from two months to 5 minutes-6 hours per batch, making production safer by removing human labor, and cutting costs in half.

War (3)

  • Firehawk aims to increase US base-bleed motor production fivefold and recently acquired a 640-acre missile integration facility in Mississippi, scaling annual production from 50,000 to 120,000 missiles.
  • Will Edwards believes the US military's transformation to nimble, tech-first systems is only 1% complete, with most innovation still coming from traditional primes despite a $1.5 trillion proposed budget.
  • Edwards highlights that missiles remain critical in modern warfare, citing the use of 14 million artillery shells in Ukraine, which represented 75% of casualties and spurred drone adoption when supplies dwindled.

4/24/26: Trump Floats Endless Iran War, Lebanon Journalist Triple Tap, AI Job LayoffsApr 24

  • Emily reports significant layoffs across major tech and retail companies, including Meta (8,000 employees), Nike (1,400 employees), Microsoft (7% US workforce buyouts), Oracle (20,000-30,000 employees), Amazon (16,000 corporate jobs), Block, and Dell (11,000 jobs).
Also from this episode: (14)

Media (2)

  • Saagar notes that Breaking Points reached number fifteen on YouTube this month, with hosts humorously suggesting their high ranking often correlates with escalating global crises and wars.
  • Emily clarifies that Pete Hegseth's 'Book of Tarantino' reference was a Pulp Fiction quote, not an invented Bible verse, and was wrongly characterized by some media outlets.

Politics (3)

  • Crystal highlights Trump's shifting rationale for the Iran conflict, from insisting on a temporary ceasefire due to Iranian division to now implying indefinite engagement, despite his past anti-'Forever Wars' rhetoric.
  • Trump claims he personally kept the Strait of Hormuz closed to prevent Iran from earning $500 million daily, asserting US control over the vital shipping lane.
  • Ryan explains that Congressman Ro Khanna's wealthy wife, with significant family money, is the source of his high stock returns, and Khanna himself has pushed legislation for blind trusts for spouses of Congress members.

War (7)

  • Crystal reports that CBS News estimates 60% of Iran's naval capacity remains intact, challenging Trump's portrayal of a defeated Iranian military.
  • Pete Hegseth's argument that the Strait of Hormuz conflict is primarily a European and Asian problem, not American, is criticized by Emily and Ryan as dishonest, given the interconnectedness of the global economy.
  • Ryan details the killing of Lebanese journalist Amal Khalil by an Israeli drone after she was injured and sought refuge in a home, despite public pleas from the Lebanese government and Red Cross for a ceasefire.
  • Jeremy Lfredo's direct message to the Israeli number that threatened Amal Khalil received a response claiming Khalil was a Hezbollah spy and threatening other affiliated journalists.
  • Crystal connects reports of widespread looting by Israeli soldiers in southern Lebanon, cited by Haaretz, to the logical outcome of a long-term dehumanization project against Palestinians and those resisting Israel.
  • Crystal argues that ongoing conflicts like the Iran War hinder global cooperation necessary for developing limiting principles and safeguards for AI, similar to how nuclear arms control was achieved.
  • Crystal posits that a 'walk away' strategy for the Iran conflict is unstable, given the need to restore free flow through the Strait of Hormuz, Israel's desire for war, and the tightening global economy.

Energy (1)

  • Ryan explains the US strategic goal is to halt Iran's petroleum industry by filling storage, a process that could take 4 to 8 weeks to recover from, and questions if the global economy can endure such disruption.

Diplomacy (1)

  • Crystal attributes the New York Times report on Khamenei Jr.'s health and use of written messages to a US attempt to portray Iran as chaotic, despite no evidence of internal government breakdown.

Stewart Brand, Silicon Valley’s Favorite Prophet, on Life’s Most Important PrincipleApr 24

  • Steve Jobs described Brand's *Whole Earth Catalog* (late 1960s) as "Google in paperback form 35 years before Google," made with typewriters, scissors, and Polaroid cameras, inspiring the internet's development.
  • Brand supports right-to-repair legislation, noting its progress in states like Massachusetts and Colorado. He highlights John Deere as a "poster child" for corporate resistance that necessitates government intervention.
Also from this episode: (6)

Society (1)

  • Ezra Klein announced a forum on California housing affordability, co-hosted by The New York Times, Housing Action Coalition, and other organizations on Friday, May 8th.

History (2)

  • Ezra Klein identifies Stewart Brand as a pivotal thinker for the internet's culture and Silicon Valley's early idealistic ethos, influencing events from 1960s counterculture to early online communities.
  • Stewart Brand describes the 1960s "Back to the Land" communes as college students attempting to reinvent civilization, though all efforts ultimately failed. He highlights lessons learned like the practical costs of "free love" and the boredom of isolated rural life.

Media (1)

  • Brand explains the *Whole Earth Catalog* was a large, folio-sized publication filled with practical knowledge (e.g., beekeeping, candle making) that conferred "agency" on users, much like YouTube does today.

Psychology (1)

  • Brand recounts a minor psychedelic experience ($20/month North Beach apartment) that inspired his campaign for NASA to release a full photograph of Earth, believing it would be transformative.

Biology (1)

  • Stewart Brand, a biologist by training, defines maintenance as "to keep things going," illustrating its pervasive role in all living systems, from biological functions to human civilization and planetary stewardship.