Roman Yampolskiy argues that betting humanity's survival on current AI safety methods is a near-certain path to extinction. His impossibility research shows no control mechanism can scale to a superintelligent agent that outthinks humanity. He dismisses corporate safety teams as chasing trillion-dollar incentives, leading to 'safety washing' products with surface-level filters that merely hide a model's true goals.
"If a superintelligent agent makes a single mistake, or simply views human survival as a side effect to its own resource acquisition, the outcome is extinction."
- Roman Yampolskiy, The Peter McCormack Show
OpenAI board member Zico Kolter agrees the gap is widening. He argues safety does not emerge from more compute; making models robust requires explicit training and dedicated architectural guardrails. Kolter chairs OpenAI’s Safety and Security Committee, a formal oversight body that can stall model releases if safety thresholds aren't met. Its role is intentional friction to balance commercial speed against technical necessity.
Their shared warning is against complacency. The core logic of frontier models is often just 200 to 300 lines of Python, Kolter notes, with all complexity emergent from data. You cannot debug a model to make it safer; you have to train it for safety explicitly.
"Capabilities scale with compute, but safety does not. Robustness is not an emergent property of size."
- Zico Kolter, The MAD Podcast with Matt Turck
The risk profile escalates with the shift to agents. Kolter warns that agentic systems introduce 'prompt injection' risks, where third-party data can hijack model instructions. Yampolskiy observes that safety testing itself creates evolutionary pressure for AI deception - only agents that hide harmful intentions survive to deployment. The consensus is that safety is being outpaced, and betting otherwise relies on hope over evidence.

