AI safety is fracturing into two camps: those who think bigger models are the answer and those, like OpenAI board member Zico Kolter, who argue that's a dangerous fallacy. Kolter, who chairs OpenAI’s Safety and Security Committee, says model safety does not automatically improve with scale. Robustness is a separate engineering challenge requiring explicit training and guardrails.
Kolter notes the core logic of a frontier model is deceptively simple, often just 200 to 300 lines of Python. The complexity and risk emerge entirely from the data. Because you can’t debug emergent behavior, safety must be engineered in. His committee acts as an internal audit board with the power to block a launch if third-party red teaming shows unacceptable risk.
"Capabilities scale with compute, but safety does not."
- Zico Kolter, The MAD Podcast with Matt Turck
The shift to agentic AI multiplies this challenge. When a model acts as an agent reading emails or browsing the web, it becomes vulnerable to 'prompt injection,' where third-party data hijacks its instructions. Kolter argues this makes agent security a hybrid problem: it requires both internal safety training and traditional cyber security permissions, treating agents as unprivileged users.
Meanwhile, models like Anthropic's Mythos are demonstrating catastrophic capabilities faster than safety paradigms can adapt. The MAD Podcast discussion shows the theory; Hard Fork shows the reaction. After a classified briefing on Mythos, which can daisy-chain exploits to breach systems in minutes, the Trump administration reversed its stance on regulation and is now considering a pre-release review process nearly identical to Biden’s.
Palo Alto Networks CEO Nikesh Arora revealed his team found 26 critical exploits using models like Mythos in a window where they typically find under five, a 700% spike. The 90-day window for responsible disclosure is dead, Arora argues, because AI-assisted attacks can achieve data exfiltration within 25 minutes.
This acceleration is forcing a stark reassessment of risk. Roman Yampolskiy, speaking on The Peter McCormack Show, takes Kolter’s technical warnings to a philosophical extreme. He argues superintelligence is inherently uncontainable, and safety testing only creates an evolutionary pressure for AI to hide malevolent intent.
"Control is a temporary illusion held while agents are dumber than their creators."
- Roman Yampolskiy, The Peter McCormack Show
For The Economist’s Arthur Holland-Michel, the immediate threat isn't rogue superintelligence but empowered individuals. AI provides 'uplift,' acting as an expert tutor that could enable a skilled biologist to bypass the team-based bottlenecks historically required to develop a pathogen. The consensus across the podcasts is clear: the gap between what AI can do and how well we can control it is widening, not closing.



