04-20-2026Price:

The Frontier

Your signal. Your price.

AI & TECH

Anthropic's Mythos sparks AI arms race

Monday, April 20, 2026 · from 5 podcasts
  • Anthropic’s unreleased Mythos model can find zero-day bugs in critical software, prompting national security-level scrutiny.
  • Rivals say GPT-5.4 already matches Mythos on exploits, suggesting the 100-day 'safety pause' masks compute shortages.
  • The model signals a shift: elite coding now equals hacking, and AI agents are rewriting software development.

Anthropic’s unreleased Mythos model isn’t just another LLM upgrade. It’s a weapon. Security researchers who’ve tested it report it discovered a 27-year-old vulnerability in OpenBSD and zero-days in FFMPEG - capabilities that have triggered emergency meetings with Treasury and Fed officials.

The stated reason? National security. The real reason may be financial. John Arnold and Marty Bent argue the AI briefing was a cover to gather Wall Street leaders over a $1 trillion hole in private credit, where insurers like Carlisle are freezing withdrawals. AI safety offers a quiet pretext to avoid a bank run.

Meanwhile, the technical claims are under fire. Brett Winton at ARK Invest notes third-party tests show GPT-5.4 can find many of the same exploits. The 100-day quarantine, he argues, is less about safety than compute scarcity. Anthropic lacks the H100s to serve the public, so it’s selling early access to patching services to the top 40 companies via Project Glasswing.

"Tell the world a tool is too dangerous for general release, then charge a premium for the cure."

- Brett Winton, FYI - For Your Innovation

The deeper shift isn’t just in security - it’s in software itself. Ben, on Nerd Snipe, replaced a months-long CLI project with a 30-line Markdown file, letting the agent handle git sync and sandboxing. Code is no longer written; it’s prompted. Even Uncle Bob, the apostle of Clean Code, now uses voice-to-code tools and calls semicolons distractions.

China isn’t waiting. Z.ai open-sourced GLM 5.1, a 754-billion-parameter model trained on Huawei chips, capable of 1,700-step autonomous work cycles - eight hours of uninterrupted coding. It proves the US compute advantage is eroding.

"If a proprietary 'God model' can pwn every browser and OS, static defenses are obsolete."

- Austin, Stacker News Live

The game has changed. Hacking is no longer a niche skill - it’s an emergent property of elite coding. The only defense may be decentralized compute, where users sell local GPU power for Bitcoin, or trust graphs that limit agent interactions to verified contacts. The AI cold war is already here.

Source Intelligence

- Deep dive into what was said in the episodes

We need to talk about gstackApr 18

Also from this episode: (15)

Other (15)

  • Anthropic's Mythos model is significantly larger than previous models, with over 10 trillion parameters, making it exceptionally skilled in coding but also slow, expensive, and dangerous due to emergent hacking capabilities.
  • Anthropic withheld Mythos from public release, citing concerns over its malicious use for hacking; Project Glass Wing allows critical infrastructure companies like Windows and Cisco to use it for proactive bug detection.
  • Ben notes that external tests show OpenAI's GPT 5.4 Pro replicated almost all security vulnerabilities found by Mythos, suggesting similar capabilities may already be widespread and accessible.
  • Theo criticizes public benchmarks comparing Mythos and GPT 5.4 Pro, arguing they fail to measure actual hacking or security capabilities and may be misleading.
  • Theo contends that exceptional coding ability in AI models inherently leads to emergent security capabilities, creating a new hacker archetype that can leverage AI to bridge knowledge gaps and bypass traditional research experience.
  • Anthropic's security testing for Mythos involved spinning up 100 to 5,000 parallel runs, each seeded with a different project file from a codebase of approximately 1,000 files, with researchers later reviewing detected exploits.
  • Ben and Theo confirmed that Claude Opus 4.6 models can be tricked into leaking their system prompts and internal reasoning traces, demonstrating a vulnerability where smart models can rationalize revealing sensitive configuration data.
  • Robert C. Martin ("Uncle Bob"), author of "Clean Code," has shifted his perspective to embrace agentic engineering, suggesting AI makes programming syntax less important and prioritizes interfaces.
  • Robert C. Martin proposes using AI to conduct programming experiments (e.g., dynamic vs. static typing) without human bias, highlighting an under-explored research area for optimizing AI agent performance with different technologies.
  • Ben emphasizes that even advanced AI models require constant feedback loops like linting, type checks, and formatting commands to correct hallucinations and converge on correct code, rather than achieving perfection in a single attempt.
  • Ben converted his complex BTCA CLI tool into a 30-line Claude skill, demonstrating how AI agents can turn simple markdown instructions into fully functional applications, replacing traditional deterministic programs.
  • Ben praises Gary Tan's GStack approach, which uses collections of markdown-based "skills" in Claude Code to instruct AI agents, allowing for dynamic programming through high-level directions rather than conventional code.
  • Ben endorses the "Boiling the Ocean" thesis, advocating for extensive AI-driven experimentation because the cost of trying new things is low, and AI models consistently exceed perceived limitations.
  • Gary Tan's article, "Thin Harness Fat Skills," differentiates between "deterministic" (traditional, predictable code) and "latent" (dynamic, non-deterministic AI actions) programming, underscoring AI's creative potential in system design.
  • Theo notes that Gary Tan's GBrain project, which processes daily AI session data to build memory systems, enables models to "learn while they sleep," which Theo considers a key component of Artificial General Intelligence (AGI).

AI's Great DivergenceApr 16

  • Anthropic has restricted its 'Mythos' model to about 40 partners for limited cybersecurity testing, reflecting a trend of staggered rollouts due to security risks. OpenAI is pursuing a similar rollout strategy for its new model.
  • Meta's new Muse Spark is a natively multimodal reasoning model designed primarily for personal agents, not enterprise use. The model supports tool use, visual chain-of-thought, and multi-agent orchestration.
  • On benchmarks, Muse Spark scored 52.4 on SweetBench Pro for coding, placing it near top models. It excels in visual comprehension, scoring a state-of-the-art 86.4 on CharViC's reasoning, beating Gemini 3.1 Pro by 6 points.
  • Mark Zuckerberg positions Muse Spark for personal use areas like visual understanding, health, and social content. He frames it as a shift from assistant AI to agentic AI, enabling it to 'do things for you' like creating mini-games or troubleshooting appliances.
  • Z.ai's open source GLM 5.1, a 754B parameter model, outperforms leading Western models on coding benchmarks with a 58.4 SweetBench Pro score. The model demonstrates long-horizon task capability, completing an eight-hour autonomous Linux desktop build.
  • Z.ai leader Lu claims agents could do about 20 steps by the end of last year, but GLM 5.1 can now do 1,700. The model's autonomous work time is cited as a critical new performance curve.
  • Anthropic released Claude Managed Agents to close a notable gap between model capability and business application, as argued by head of product Angela Jiang. The platform bundles an agent harness with production infrastructure, aiming to reduce engineering overhead.
  • Claude Managed Agents enables scheduled, event-triggered, and long-horizon tasks. It abstracts self-hosting complexity, but lacks persistent memory across sessions, making it best suited for discrete, transactional operations.
  • Google introduced 'notebooks in Gemini', integrating Notebook LM's resource management directly into the app. Google's Josh Woodward positions this as building 'a second brain' beyond basic AI chatbot projects.
  • Ethan Mollick notes Muse Spark is fine but doesn't match the big three models, displaying some strange language and looseness with facts. François Chollet criticizes Meta for over-optimizing for benchmarks at the expense of actual usefulness.
  • Alexander Wang of Meta responded to criticism by saying the lab is open to feedback and is upfront about the model's weaknesses, such as low performance on the ARB GI 2 benchmark.
  • GLM 5.1 was trained entirely on less powerful Huawei chips, demonstrating China's hardware stack can produce powerful results. Its release two months after US leaders suggests the US lead over Chinese rivals is only a few months.

Mythos And AI Safety | The Brainstorm EP 127Apr 15

  • Anthropic is restricting access to its new AI model Mythos for 100 days, offering it only to the top 40 companies through Project Glasswing so they can patch zero-day vulnerabilities the model discovered.
  • Brett interprets Anthropic's Mythos release as a marketing and supply tactic, not genuine safety, arguing it's meant to induce enterprises to pay for early access to fix their code while the company is compute-constrained.
  • Brett says third-party tests have shown many software exploits detected by Anthropic's Mythos can also be found by GPT-5.4, undermining claims of Mythos's unique vulnerability-finding capability.
  • ARK's analysis positions Mythos as materially better at software engineering benchmarks, advancing performance they expected a year from now to today, but the 100-day delay reduces that lead to an 8-month advantage.
  • OpenAI is rumored to have a similarly performant model developed over two years that it will release broadly because it currently has more abundant compute than Anthropic.
  • Claude's consumer usage is catching up to ChatGPT, which Brett attributes to workplace adoption spilling over into personal use as people recognize its power.
  • The core strategic debate is whether winning in AI depends on having the best product or controlling the compute supply needed to build the best product.
Also from this episode: (7)

AI & Tech (7)

  • Brett argues AI companies make allocation decisions between training, enterprise service, and consumer business to maximize valuation ahead of a public market entry, securing capital for future compute.
  • Nick sees Meta as a formidable competitor in AI because its advertising business lets it deliver a consumer experience without directly monetizing the model, and it doesn't have to sell compute to others.
  • Nick argues product and distribution ultimately win in AI, citing Cohere's enterprise success based on product fit rather than model capability.
  • Brett notes OpenAI invests more in model training and has better medium-term compute access than Anthropic, per public reports, which affects their product roadmaps.
  • Consumer AI use cases have changed little in three years despite model improvements, while enterprise use has diversified as workers actively seek tools to lighten their workloads.
  • On the enterprise side, Brett argues market share will stabilize around compute supply because if a provider like Anthropic signs too many customers and lacks capacity, customers will churn to a competitor.
  • The group discusses a concept for a new trust-based social network where AI agents interact only with agents of vetted contacts, arguing current algorithmic social media adulterates real friendship.

SNL #219: Killing SatoshiApr 13

Also from this episode: (12)

War (1)

  • Keon discusses a story about an F-15E Strike Eagle aircraft with two airmen being shot down over Iran.

Mining (3)

  • Dan, a Bitcoiner in Iceland, shares his experience with a home Bitcoin mining heater called the Open Two from a company called 21 Energy.
  • Dan reports his mining unit achieved 43 terahash per second but was too loud, and that his total household power consumption was nearly 4,000 kilowatt hours over three months at a cost equivalent to $681.
  • Dan earned 115,000 sats, worth about $80, from his mining heater over the same period, projecting a 26-month payback period for the device.

Adoption (1)

  • NeedCreations launched btcedu.app, a Bitcoin education archive where users can earn points and withdraw 100 sats after accumulating 1,000 points.

Protocol (4)

  • Keon cites Brian Quintin's Myers-Briggs survey showing Bitcoiners heavily skew toward INTJ (34%) and INTP (22%) personality types, diverging significantly from the general population.
  • Keon sees the open-agents movement, where people sell compute for Bitcoin, as a bullish counterbalance to centralized AI power and a potential defense against models like Mythos.
  • Aardvark proposes a quantum-safe Bitcoin transaction scheme using Lamport signatures, which results in a 10,000-byte script size and requires 150 dummy signatures with hash commitments.
  • The hosts discuss the upcoming movie 'Killing Satoshi,' directed by Doug Liman and starring Pete Davidson, Casey Affleck, and Gal Gadot, which fictionalizes an investigator trying to expose Bitcoin's creator.

AI & Tech (3)

  • The hosts discuss a New Yorker article characterizing Sam Altman as dishonest, citing his firing from OpenAI's board and claims of misleading Anthropic's founder about AI safety commitments.
  • Anthropic is working with 40 companies through 'Project Glasswing' to test its new AI model, Mythos, for cybersecurity vulnerabilities before a public release.
  • The hosts express concern that Mythos could find zero-day vulnerabilities in critical open-source software, including Bitcoin Core, posing a significant security threat if capabilities are locked away.

Ten31 Timestamp: You Say Ceasefire, and I Say EscalationApr 13

  • Marty references reports suggesting Anthropic's Mythos AI model is not as groundbreaking as claimed, with existing models capable of similar zero-day discoveries, which are illegal to exploit.
Also from this episode: (6)

War (1)

  • Marty Bent notes US Navy blockaded Iranian ports in the Strait of Hormuz, following brief talks between JD Vance and an Iranian faction, leading to oil market escalation.

Markets (1)

  • John highlights a map from Rory Johnson showing a significant redirection of Very Large Crude Carriers (VLCCs) to the US Gulf, indicating a shift in oil market leverage towards the US amid global artery closures.

Trade (1)

  • China is curbing sulfuric acid exports starting in May, responding to perceived US leverage and potential disruption to metal processing, phosphate fertilizers, and fibers.

BTC Markets (2)

  • Marty and John observe Bitcoin's relative strength, trading around $71,800, acting as a risk-off asset during geopolitical and financial uncertainty, contrary to past liquidity crises.
  • John suggests a fractured, multipolar global order, where just-in-time supply chains falter and trust diminishes, creates an ideal environment for Bitcoin as a neutral, sovereign store of value.

AI & Tech (1)

  • Anthropic's Mythos AI model is presented as a significant step function improvement, with reports of it finding zero-day bugs in critical software, prompting national security concerns and government attention.