AI isn’t getting smarter. It’s just getting hungrier. While labs pour billions into larger models and more data, a quiet consensus is forming: progress is built on brute force, not breakthroughs.
Dwarkesh Patel on the Dwarkesh Podcast laid it bare. A child learns language from roughly 200 million words. GPT-4 was trained on 100 trillion. That millionfold gap isn’t noise - it’s the signal. We haven’t cracked efficient learning. We’ve just scaled the waste.
Scaling laws offer little comfort. Even under optimistic Chinchilla models, bigger parameters only reduce data needs tenfold. Humans learn complex skills from one or two examples. AI runs thousands of rollouts per task using GRPO, a reinforcement learning method that simulates trial and error at massive scale.
The industry’s dirty secret? Human experts are doing the real work. Companies like Scale AI and Surge pay lawyers, consultants, and specialists to generate labeled data - thousands of examples for every narrow task. This isn’t emergent intelligence. It’s hand-stitched automation.
"The intelligence we see is a collection of expert trajectories congealed in an RL environment."
- Dwarkesh Patel, Dwarkesh Podcast
This synthetic data generation is why open-source models catch up so fast. They distill outputs from public APIs, effectively reverse-engineering the human-labeled training sets. The moat isn’t architecture or code - it’s access to expensive, expert-generated data.
And the money is real. Patel estimates the labeling industry already generates billions in revenue and is headed toward decabillions. Labs don’t need efficient AI to automate white-collar work. They can amortize massive training costs across billions of inferences. Brute force wins in the market, even if it fails in the lab.
Alice Zhang of Verge Labs offers a different path. Her team built a dataset from over 12,000 human brains - the largest directly grounded in patient tissue. For neuroscience, where you can’t biopsy living brains, this is the LiDAR that anchors inference from blood samples.
"We learned predicting patient response is more valuable than just discovering a drug."
- Alice Zhang, This Week in Startups
Zhang’s moat isn’t bigger models. It’s a decade-long pipeline of biological truth. Where others scale data, she grounds it. Her multimodal transformer fuses blood markers with molecular autopsies, predicting drug efficacy before symptoms appear.
The divergence is stark. One path doubles down on data volume. The other bets on data quality. One hires armies of experts to simulate reasoning. The other builds infrastructure to capture ground truth.
The real test comes next. Can AI automate its own research? If not, the data black hole will keep widening - and the cost of pretending will only grow.

