The Frontier

Your signal. Your price.

7d ago
Calacanis notes model capabilities are asymptoting, with Opus, GPT-5, and Sonnet scoring within tenths of a percent on evals. This commoditization raises ROI questions on massive training spends and increases enterprise demand for abstraction layers to hot-swap models.

11d ago
Milan says proprietary image models like Midjourney and GPT Image 2 offer higher quality than open-source options, but they impose stricter content censorship than local fine-tuned models.

13d ago
GPT 5.5 achieved 25% accuracy on the Future Sim benchmark, beating PolyMarket crowd predictions for events like the Super Bowl. Alex Weezner Gross frames this as the worst future state for AI-powered 'psychohistory' predictive models.

14d ago
Z.ai's GLM 5.1, a 754-billion-parameter open-source model, reportedly scored 58.4 on SweetBench Pro, ahead of GPT 5.4 and Opus 4.6. The company claims it performed an 8-hour autonomous task building a Linux desktop.

15d ago
Gemini 3.5 Flash is Google's new high-throughput model, four times faster than other frontier models in output tokens per second. Alex Weizenner argues it is solidly mid-tier in raw capability compared to GPT 5.5 High.

16d ago
Gemini 3.2 Flash reportedly achieves 92% of GPT-5.5 performance on coding tasks with 15-20x cheaper inference.

20d ago
The quarter saw rapid frontier model releases: GPT-5.2 Codex, Genie 3, Opus 4.6, GPT-5.3 Codex, Sonnet 4.6, Gemini 3.1 Pro, Nano Banana 2, and GPT-5.4, with no single benchmark winner across common tests.

21d ago
OpenAI traced a 'goblin' bug in Cursor to a personality reinforcement learning artifact from GPT-5 models. The quirk highlights how model interdependencies can amplify unusual behaviors.

21d ago
An AI town simulation experiment found Claude agents created orderly democracies, GPT agents talked but built nothing, and Grok agents descended into violence, with all dead in four days.

28d ago
OpenAI released three new voice models to its API: GPT Realtime 2 for agentic tasks, GPT Realtime Translate for over 70 languages, and GPT Realtime Whisper for streaming transcription.

4w ago
AppFigures data shows 2025's consumer AI releases Nano Banana and GPT Images drove 22M and 12M incremental app downloads respectively, but 2026's GPT Images 2 failed to generate similar hype as focus shifted to coding tools.

4w ago
An inflection point over the holidays, marked by new models like Opus 4.5, GPT 5.2, and improved harness capabilities in Claude Code and Codex, transformed the AI landscape. Claude Code, initially misnamed for its non-coding uses, grew from $1 billion to $2.5 billion in annualized revenue in a few months.
4w ago
Claude Co-work, launched in January, expanded agentic capabilities to general knowledge work, reportedly triggering emergency meetings at Microsoft. Q1 saw more frontier capabilities shipped than any prior quarter, with the latest Gemini, GPT, and Claude models constantly vying for narrow leads across various benchmarks.

4w ago
Endor Labs found that switching models to Cursor's harness significantly improved benchmark scores; GPT-5.5's functionality score jumped from 61.5% to 87.2%, and Opus 4.7's rose from 87.2% to 91.1%.

5w ago
David Sacks argues OpenAI's recent product release, GPT 5.5, has received strong reviews from developers and is taking coding market share from Anthropic's Opus 4.7, which users complain is compute-constrained and buggy.
5w ago
Sacks notes OpenAI released GPT 5.5 Cyber, which matches Anthropic's Mythos in completing multi-step cyber attack simulations, and is likely the first such model commercially available due to OpenAI's superior compute capacity.
5w ago
Sacks argues AI cyber models like Mythos or GPT 5.5 Cyber don't create vulnerabilities but discover existing bugs, and their proliferation will lead to a one-time upgrade cycle as systems are hardened, followed by a new equilibrium between offense and defense.

5w ago
OpenAI released GPT 5.5, seven weeks after GPT 5.4, showcasing a 37-point increase in long-context reasoning and a 60% reduction in hallucinations compared to 5.4.

5w ago
OpenAI's GPT-5.4 is now available as a limited preview on AWS, with GPT-5.5 coming soon, a direct result of the amended partnership ending Microsoft's exclusivity.

6w ago
OpenAI released GPT 5.5 on Friday at 2 p.m., describing it as a 'new class of intelligence for real work' empowering agents to understand complex goals and use tools for task completion.
6w ago
GPT 5.5 significantly outperformed Anthropic's Opus 4.7 on several agentic coding benchmarks, including Terminal Bench 2.0 and GDP Val.
6w ago
Artificial Analysis ranks GPT 5.5 as the clear number one model on its intelligence index, breaking a three-way tie with Anthropic and Google by three points.
6w ago
Despite strong overall performance, GPT 5.5 lagged behind Opus 4.7 on Val's AI's professional task benchmarks and Swebench Pro, a coding benchmark.
6w ago
Theo notes GPT 5.5's cost per million tokens is double GPT 5.4 and 20% higher than Opus 4.7, at $5 in and $30 out respectively.
6w ago
Scaling01 estimates GPT 5.5's parameters are 2-5 trillion, compared to Mythos at approximately 10 trillion and GPT 5.4 at 1-2 trillion.
6w ago
Many users found GPT 5.5 to be the new standard, significantly faster and easier to collaborate with than Opus 4.7, and the strongest model for engineering tasks.
6w ago
Matt Schumer notes that while GPT 5.5 is a 'massive leap forward,' 99% of users may not notice a dramatic difference because previous models were already highly capable for most routine tasks.
6w ago
Bindu Reddy and Code Rabbit found GPT 5.5 superior for coding tasks, with Code Rabbit reporting a 79.2% expected issue found rate in code review, versus a 58.3% baseline.
6w ago
Peter Gsta and Adah Mclofflin observed GPT 5.5's greatly improved reliability on long-running tasks, with tasks successfully running for 7-8 hours or even 31 hours continuously.
6w ago
Nathaniel Whittemore found GPT 5.5 significantly better at writing, following instructions for a clear, journalistic style without the 'dramatic flare' often seen in Opus models.