What happened with openai codex now runs macs autonomously?

OpenAI’s Codex now autonomously controls Macs, executing tasks across Slack, Gmail, and GitHub.

What happened with openai codex now runs macs autonomously?

Persistent 'monothreads' act as always-on digital teammates, no longer disposable.

What happened with openai codex now runs macs autonomously?

A philosophical split emerges: OpenAI’s invisible interface vs. Anthropic’s structured modes.

AI & TECH

OpenAI Codex now runs Macs autonomously

Sunday, April 19, 2026 · from 1 podcast

1 SOURCEAI Daily Brief

AI Daily Brief

OpenAI’s Codex now autonomously controls Macs, executing tasks across Slack, Gmail, and GitHub.
Persistent 'monothreads' act as always-on digital teammates, no longer disposable.
A philosophical split emerges: OpenAI’s invisible interface vs. Anthropic’s structured modes.

OpenAI has quietly crossed a threshold: its Codex agents now operate Mac computers end to end. No longer limited to code generation, they execute workflows across native apps - sending Slack messages, triaging Gmail, and pushing commits to GitHub - without human intervention. This isn’t automation layered on top of the OS. It’s full system control driven by reasoning models.

The shift became visible in internal workflows at Aardvark, where engineers like Nick Bauman now rely on permanent chat threads that never reset. These 'monothreads' run heartbeats every 15 minutes, scanning for blockers and drafting responses. Context compaction keeps them coherent for weeks. What started as a way to manage GitHub alerts has evolved into a 24/7 digital chief of staff.

"We’re no longer writing prompts. We’re delegating jobs."
- Nathaniel Whittemore, The AI Daily Brief

The capability leap isn’t just technical - it’s philosophical. OpenAI assumes one interface fits all. Its Codex app merges coding, documentation, and system control into a single text field. The model decides when to write code, when to summarize, and when to act. There’s no mode switching. The interface vanishes.

Anthropic disagrees. Its Opus 4.7 and desktop app enforce separation: Chat, Cowork, Code. Kat Wu from Anthropic argues this structure prevents cognitive overload. "Delegation works best when the goal is clear and contained," she says. Opus excels at dense reasoning - investment theses, visual analysis of whiteboards - but demands users define scope upfront.

The divergence reflects deeper bets about human-AI collaboration. OpenAI sees a world where agents dissolve into the background, acting continuously across digital life. Anthropic sees focused co-workers, each hired for a specific role. Both agree on one thing: the disposable chat is dead. The new standard is persistence, autonomy, and system-level access.

"The model knows when to write code and when to generate a presentation. You don’t have to tell it."
- Nathaniel Whittemore, The AI Daily Brief

Agents AI Infrastructure Enterprise

Nathaniel Whittemore Anthropic Slack Codex OpenAI Opus

Source Intelligence

- Deep dive into what was said in the episodes

The AI Daily Brief: Artificial Intelligence News and Analysis

Nathaniel Whittemore

How to Use Opus 4.7 and the New Codex • Apr 17

Nathaniel Whittemore says OpenAI's Codex app now has full computer use for Mac, allowing it to see, click, and type across any application, including those without APIs. Multiple agents can work in parallel.
Codex introduces an in-app browser with comment mode, letting users click elements for precise context. Nathaniel Whittemore highlights this for front-end iteration, bug reporting, and workflows where pointing is faster than describing.
Nathaniel Whittemore notes Codex now includes native image generation with GPT Image 1.5 and rich file previews in Artifacts Beyond Codes for creating mock-ups and editing images within a single thread.
Pash from OpenAI describes Codex's 'thread over time' feature. Threads persist with history and context, and agents can schedule their own next steps, reducing the overhead of daily catch-up tasks like scanning Slack and email.
Codex now supports project-less threads, which Flavio Adama and Jason Liu argue facilitates unstructured work. Liu calls it 'the new Notes app', allowing users to dive in without first selecting a repository.
Ari Weinstein observes that Codex can operate a GUI as fast as a human. Nathaniel Whittemore cites Aaron Levy of Box who sees this as a leap for knowledge worker agents capable of long background tasks like drafting reports and reviewing contracts.
Nick Bauman of OpenAI advocates for a 'monothread' approach in Codex. He keeps a single, long-lived thread that checks his Slack, Gmail, and GitHub hourly to filter noise into actionable signal, shifting from many short chats to a few persistent workstream threads.
Jason Liu provides a recipe for a 'Codex chief of staff'. It uses a local folder vault with an agents.md file, interviews the user to understand responsibilities, and proposes creating project notes and installing plugins like Slack and Gmail.
Nathaniel Whittemore reports Anthropic's Opus 4.7 model shows major benchmark improvements: Finance Agent up to 64.4%, Office QA Pro to 80.6%, and OS World Computer Use to 78%. It made about 20% more money on the VendingBench2 test.
Anthropic's Kat Wu advises users to delegate, not micromanage Opus 4.7, providing the full goal and constraints up front. Boris Cherney details new effort level configurations, recommending 'extra high' for most tasks and 'max' for the hardest.
Nathaniel Whittemore contrasts OpenAI Codex's unified interface with Claude Desktop's segmented one. Codex uses one interface for all tasks, while Claude separates Chat, Cowork, and Code modes, reflecting different bets on user friction versus task specialization.

Also from this episode: (2)

Models (2)

Anthony Kroger and Nick Bauman argue Codex's context compaction is a game-changer. Kroger says he never worries about context windows, and Bauman notes dropping the assumption that compaction degrades results opens new product directions.
Opus 4.7 has a regression on one long-context retrieval benchmark, dropping from 78.3% to 32.2%. Claude code creator Boris Cherney says the benchmark is being phased out as it overweights distractors and doesn't reflect real reasoning.

Agents Models Reasoning Coding

The Frontier

OpenAI Codex now runs Macs autonomously

Source Intelligence

Related Stories