OpenAI has quietly crossed a threshold: its Codex agents now operate Mac computers end to end. No longer limited to code generation, they execute workflows across native apps - sending Slack messages, triaging Gmail, and pushing commits to GitHub - without human intervention. This isn’t automation layered on top of the OS. It’s full system control driven by reasoning models.
The shift became visible in internal workflows at Aardvark, where engineers like Nick Bauman now rely on permanent chat threads that never reset. These 'monothreads' run heartbeats every 15 minutes, scanning for blockers and drafting responses. Context compaction keeps them coherent for weeks. What started as a way to manage GitHub alerts has evolved into a 24/7 digital chief of staff.
"We’re no longer writing prompts. We’re delegating jobs."
- Nathaniel Whittemore, The AI Daily Brief
The capability leap isn’t just technical - it’s philosophical. OpenAI assumes one interface fits all. Its Codex app merges coding, documentation, and system control into a single text field. The model decides when to write code, when to summarize, and when to act. There’s no mode switching. The interface vanishes.
Anthropic disagrees. Its Opus 4.7 and desktop app enforce separation: Chat, Cowork, Code. Kat Wu from Anthropic argues this structure prevents cognitive overload. "Delegation works best when the goal is clear and contained," she says. Opus excels at dense reasoning - investment theses, visual analysis of whiteboards - but demands users define scope upfront.
The divergence reflects deeper bets about human-AI collaboration. OpenAI sees a world where agents dissolve into the background, acting continuously across digital life. Anthropic sees focused co-workers, each hired for a specific role. Both agree on one thing: the disposable chat is dead. The new standard is persistence, autonomy, and system-level access.
"The model knows when to write code and when to generate a presentation. You don’t have to tell it."
- Nathaniel Whittemore, The AI Daily Brief
