Your signal. Your price.
Calacanis notes model capabilities are asymptoting, with Opus, GPT-5, and Sonnet scoring within tenths of a percent on evals. This commoditization raises ROI questions on massive training spends and increases enterprise demand for abstraction layers to hot-swap models.
Milan says proprietary image models like Midjourney and GPT Image 2 offer higher quality than open-source options, but they impose stricter content censorship than local fine-tuned models.
GPT 5.5 achieved 25% accuracy on the Future Sim benchmark, beating PolyMarket crowd predictions for events like the Super Bowl. Alex Weezner Gross frames this as the worst future state for AI-powered 'psychohistory' predictive models.
Z.ai's GLM 5.1, a 754-billion-parameter open-source model, reportedly scored 58.4 on SweetBench Pro, ahead of GPT 5.4 and Opus 4.6. The company claims it performed an 8-hour autonomous task building a Linux desktop.
Gemini 3.5 Flash is Google's new high-throughput model, four times faster than other frontier models in output tokens per second. Alex Weizenner argues it is solidly mid-tier in raw capability compared to GPT 5.5 High.
Gemini 3.2 Flash reportedly achieves 92% of GPT-5.5 performance on coding tasks with 15-20x cheaper inference.
The quarter saw rapid frontier model releases: GPT-5.2 Codex, Genie 3, Opus 4.6, GPT-5.3 Codex, Sonnet 4.6, Gemini 3.1 Pro, Nano Banana 2, and GPT-5.4, with no single benchmark winner across common tests.
OpenAI traced a 'goblin' bug in Cursor to a personality reinforcement learning artifact from GPT-5 models. The quirk highlights how model interdependencies can amplify unusual behaviors.
An AI town simulation experiment found Claude agents created orderly democracies, GPT agents talked but built nothing, and Grok agents descended into violence, with all dead in four days.
OpenAI released three new voice models to its API: GPT Realtime 2 for agentic tasks, GPT Realtime Translate for over 70 languages, and GPT Realtime Whisper for streaming transcription.