AI safety efforts are escalating, but experts debate whether they’re substantive or superficial.
OpenAI board member Zico Kolter chairs its Safety and Security Committee, which functions as an audit board. He said on The MAD Podcast that if safety reviews show a model fails to meet catastrophic risk thresholds - like in biology or autonomous self-improvement - the committee can stall its release. Kolter argued safety doesn’t scale automatically with compute; it requires explicit training separate from capabilities.
“Capabilities scale with compute, but safety does not. Robustness is a separate engineering challenge that requires specific post-training and architectural guardrails.”
- Zico Kolter, The MAD Podcast with Matt Turck
This formal oversight contrasts with stark criticism from other researchers. Roman Yampolskiy, on The Peter McCormack Show, dismissed corporate safety work as theater. He said developers chasing trillion-dollar incentives merely ‘safetywash’ products with surface-level filters that hide a model’s internal goals. Yampolskiy estimates the probability of superintelligent AI causing human extinction as extremely high, using a figure with ‘a lot of nines’.
The debate gained urgency as AI’s tangible risks materialized. Hard Fork reported the Trump administration abandoned its anti-regulation stance after a classified briefing on Anthropic’s Mythos model, which can find novel vulnerabilities and daisy-chain exploits. Officials who mocked pre-release testing are now drafting orders to implement it.
Arthur Holland-Michel of The Intelligence warned AI acts as an expert tutor, compressing years of team-based research into a solo project for a skilled biologist. He said refusal mechanisms are easily bypassed, leaving regulatory fixes flimsy.
Kolter’s layered defense - combining safety training, monitoring, and minimal agent permissions - aims to address this. But the divide remains: institutional oversight versus a belief that control is a temporary illusion.
“Control is a temporary illusion held while agents are dumber than their creators. Superintelligence is inherently uncontainable.”
- Roman Yampolskiy, The Peter McCormack Show
The policy response is fragmented. Hard Fork noted a turf war between agencies and that the Pentagon both designated Anthropic a supply chain risk and implemented Mythos to scan for vulnerabilities. Germany now demands access to state-of-the-art models for its own safety institute.
As agents introduce new risks like prompt injection, and markets ignore geopolitical shocks because traditional safe havens are broken, the pressure to act - or pretend to act - only grows.




