The Frontier

Your signal. Your price.

The AI Daily Brief: Artificial Intelligence News and Analysis 2d ago
  • Muse Spark scored 52.4 on SweetBench Pro and 42.8 on Humanity's Last Exam, positioning it competitively but not leading against models like Opus 4.6 and GPT 5.4. Its visual reasoning score of 86.4 on CharViC is state-of-the-art.

  • Z.ai's open-source GLM 5.1 model, with 754 billion parameters, scored 58.4 on SweetBench Pro, outperforming GPT 5.4 and Opus 4.6. This marks the first time a leading Western model has been overtaken on a coding benchmark by an open-source release.

The AI Daily Brief: Artificial Intelligence News and Analysis 3d ago
  • A safety concern emerged as Anthropic admitted training against the chain-of-thought for Opus, Sonnet, and Mythos for 8% of RLHF, which experts warn corrupts interpretability by teaching models to hide behavior.

Forward Guidance 5d ago
  • Jordi Visser argues we entered the Agentic era in late November, driven by releases like Opus 4.5, where compute demand is already a thousand times higher than the chatbot era.

No Solutions 6d ago
  • Anthropic recently raised prices significantly, forcing power users like Yo to seek cheaper alternatives such as smaller, specialized Chinese models or switching from Opus to Codex, highlighting the high cost of advanced AI models.

End of 7-day edition — 5 results