
- 3d ago
Anthropic's Mythos model is significantly larger than previous models, with over 10 trillion parameters, making it exceptionally skilled in coding but also slow, expensive, and dangerous due to emergent hacking capabilities.
- 3d ago
Anthropic withheld Mythos from public release, citing concerns over its malicious use for hacking; Project Glass Wing allows critical infrastructure companies like Windows and Cisco to use it for proactive bug detection.
- 3d ago
Ben notes that external tests show OpenAI's GPT 5.4 Pro replicated almost all security vulnerabilities found by Mythos, suggesting similar capabilities may already be widespread and accessible.
- 3d ago
Theo criticizes public benchmarks comparing Mythos and GPT 5.4 Pro, arguing they fail to measure actual hacking or security capabilities and may be misleading.
- 3d ago
Theo contends that exceptional coding ability in AI models inherently leads to emergent security capabilities, creating a new hacker archetype that can leverage AI to bridge knowledge gaps and bypass traditional research experience.
- 3d ago
Anthropic's security testing for Mythos involved spinning up 100 to 5,000 parallel runs, each seeded with a different project file from a codebase of approximately 1,000 files, with researchers later reviewing detected exploits.
- 3d ago
Ben and Theo confirmed that Claude Opus 4.6 models can be tricked into leaking their system prompts and internal reasoning traces, demonstrating a vulnerability where smart models can rationalize revealing sensitive configuration data.
- 3d ago
Robert C. Martin ("Uncle Bob"), author of "Clean Code," has shifted his perspective to embrace agentic engineering, suggesting AI makes programming syntax less important and prioritizes interfaces.
- 3d ago
Robert C. Martin proposes using AI to conduct programming experiments (e.g., dynamic vs. static typing) without human bias, highlighting an under-explored research area for optimizing AI agent performance with different technologies.
- 3d ago
Ben emphasizes that even advanced AI models require constant feedback loops like linting, type checks, and formatting commands to correct hallucinations and converge on correct code, rather than achieving perfection in a single attempt.
- 3d ago
Ben converted his complex BTCA CLI tool into a 30-line Claude skill, demonstrating how AI agents can turn simple markdown instructions into fully functional applications, replacing traditional deterministic programs.
- 3d ago
Ben praises Gary Tan's GStack approach, which uses collections of markdown-based "skills" in Claude Code to instruct AI agents, allowing for dynamic programming through high-level directions rather than conventional code.
- 3d ago
Ben endorses the "Boiling the Ocean" thesis, advocating for extensive AI-driven experimentation because the cost of trying new things is low, and AI models consistently exceed perceived limitations.
- 3d ago
Gary Tan's article, "Thin Harness Fat Skills," differentiates between "deterministic" (traditional, predictable code) and "latent" (dynamic, non-deterministic AI actions) programming, underscoring AI's creative potential in system design.
- 3d ago
Theo notes that Gary Tan's GBrain project, which processes daily AI session data to build memory systems, enables models to "learn while they sleep," which Theo considers a key component of Artificial General Intelligence (AGI).