JUNE 6, 2026
JUNE 6, 2026 UPDATED

The Frontier

Your signal. Your price.

Include
Mute
tap any pill to mute
Lookback
||
  • · 7d ago

    Calacanis notes model capabilities are asymptoting, with Opus, GPT-5, and Sonnet scoring within tenths of a percent on evals. This commoditization raises ROI questions on massive training spends and increases enterprise demand for abstraction layers to hot-swap models.

  • · 11d ago

    Milan says proprietary image models like Midjourney and GPT Image 2 offer higher quality than open-source options, but they impose stricter content censorship than local fine-tuned models.

  • · 13d ago

    GPT 5.5 achieved 25% accuracy on the Future Sim benchmark, beating PolyMarket crowd predictions for events like the Super Bowl. Alex Weezner Gross frames this as the worst future state for AI-powered 'psychohistory' predictive models.

  • · 14d ago

    Z.ai's GLM 5.1, a 754-billion-parameter open-source model, reportedly scored 58.4 on SweetBench Pro, ahead of GPT 5.4 and Opus 4.6. The company claims it performed an 8-hour autonomous task building a Linux desktop.

  • · 15d ago

    Gemini 3.5 Flash is Google's new high-throughput model, four times faster than other frontier models in output tokens per second. Alex Weizenner argues it is solidly mid-tier in raw capability compared to GPT 5.5 High.

  • · 16d ago

    Gemini 3.2 Flash reportedly achieves 92% of GPT-5.5 performance on coding tasks with 15-20x cheaper inference.

  • · 20d ago

    The quarter saw rapid frontier model releases: GPT-5.2 Codex, Genie 3, Opus 4.6, GPT-5.3 Codex, Sonnet 4.6, Gemini 3.1 Pro, Nano Banana 2, and GPT-5.4, with no single benchmark winner across common tests.

  • · 21d ago

    OpenAI traced a 'goblin' bug in Cursor to a personality reinforcement learning artifact from GPT-5 models. The quirk highlights how model interdependencies can amplify unusual behaviors.

  • · 21d ago

    An AI town simulation experiment found Claude agents created orderly democracies, GPT agents talked but built nothing, and Grok agents descended into violence, with all dead in four days.

  • · 28d ago

    OpenAI released three new voice models to its API: GPT Realtime 2 for agentic tasks, GPT Realtime Translate for over 70 languages, and GPT Realtime Whisper for streaming transcription.

  • · 4w ago

    AppFigures data shows 2025's consumer AI releases Nano Banana and GPT Images drove 22M and 12M incremental app downloads respectively, but 2026's GPT Images 2 failed to generate similar hype as focus shifted to coding tools.

  • · 4w ago

    An inflection point over the holidays, marked by new models like Opus 4.5, GPT 5.2, and improved harness capabilities in Claude Code and Codex, transformed the AI landscape. Claude Code, initially misnamed for its non-coding uses, grew from $1 billion to $2.5 billion in annualized revenue in a few months.

  • · 4w ago

    Claude Co-work, launched in January, expanded agentic capabilities to general knowledge work, reportedly triggering emergency meetings at Microsoft. Q1 saw more frontier capabilities shipped than any prior quarter, with the latest Gemini, GPT, and Claude models constantly vying for narrow leads across various benchmarks.

  • · 4w ago

    Endor Labs found that switching models to Cursor's harness significantly improved benchmark scores; GPT-5.5's functionality score jumped from 61.5% to 87.2%, and Opus 4.7's rose from 87.2% to 91.1%.

  • · 5w ago

    David Sacks argues OpenAI's recent product release, GPT 5.5, has received strong reviews from developers and is taking coding market share from Anthropic's Opus 4.7, which users complain is compute-constrained and buggy.

  • · 5w ago

    Sacks notes OpenAI released GPT 5.5 Cyber, which matches Anthropic's Mythos in completing multi-step cyber attack simulations, and is likely the first such model commercially available due to OpenAI's superior compute capacity.

  • · 5w ago

    Sacks argues AI cyber models like Mythos or GPT 5.5 Cyber don't create vulnerabilities but discover existing bugs, and their proliferation will lead to a one-time upgrade cycle as systems are hardened, followed by a new equilibrium between offense and defense.

  • · 5w ago

    OpenAI released GPT 5.5, seven weeks after GPT 5.4, showcasing a 37-point increase in long-context reasoning and a 60% reduction in hallucinations compared to 5.4.

  • · 5w ago

    OpenAI's GPT-5.4 is now available as a limited preview on AWS, with GPT-5.5 coming soon, a direct result of the amended partnership ending Microsoft's exclusivity.

  • · 6w ago

    OpenAI released GPT 5.5 on Friday at 2 p.m., describing it as a 'new class of intelligence for real work' empowering agents to understand complex goals and use tools for task completion.

  • · 6w ago

    GPT 5.5 significantly outperformed Anthropic's Opus 4.7 on several agentic coding benchmarks, including Terminal Bench 2.0 and GDP Val.

  • · 6w ago

    Artificial Analysis ranks GPT 5.5 as the clear number one model on its intelligence index, breaking a three-way tie with Anthropic and Google by three points.

  • · 6w ago

    Despite strong overall performance, GPT 5.5 lagged behind Opus 4.7 on Val's AI's professional task benchmarks and Swebench Pro, a coding benchmark.

  • · 6w ago

    Theo notes GPT 5.5's cost per million tokens is double GPT 5.4 and 20% higher than Opus 4.7, at $5 in and $30 out respectively.

  • · 6w ago

    Scaling01 estimates GPT 5.5's parameters are 2-5 trillion, compared to Mythos at approximately 10 trillion and GPT 5.4 at 1-2 trillion.

  • · 6w ago

    Many users found GPT 5.5 to be the new standard, significantly faster and easier to collaborate with than Opus 4.7, and the strongest model for engineering tasks.

  • · 6w ago

    Matt Schumer notes that while GPT 5.5 is a 'massive leap forward,' 99% of users may not notice a dramatic difference because previous models were already highly capable for most routine tasks.

  • · 6w ago

    Bindu Reddy and Code Rabbit found GPT 5.5 superior for coding tasks, with Code Rabbit reporting a 79.2% expected issue found rate in code review, versus a 58.3% baseline.

  • · 6w ago

    Peter Gsta and Adah Mclofflin observed GPT 5.5's greatly improved reliability on long-running tasks, with tasks successfully running for 7-8 hours or even 31 hours continuously.

  • · 6w ago

    Nathaniel Whittemore found GPT 5.5 significantly better at writing, following instructions for a clear, journalistic style without the 'dramatic flare' often seen in Opus models.

End of 90-day results — 51 results
51 results