OpenAI’s grip on the AI market slipped below majority for the first time, its most celebrated researcher defected to a rival, and a new benchmark found that the best frontier models still fail two in three real scientific tasks. The day’s news is a single argument: the era of unchallenged leadership is over.
The Monopoly Cracks: ChatGPT Loses Its Majority and OpenAI Loses Its Architect
Two data points arrived within hours of each other, and together they reframe OpenAI’s position in the market. The company that defined the category now holds less than half of it, and the researcher who co-wrote the attention paper is working for its biggest competitor.
- ChatGPT falls below 50% market share for the first time. Sensor Tower data puts OpenAI’s assistant at 46.4% of the AI app market by May, as Gemini and Claude each post gains that were unimaginable during ChatGPT’s monopoly years. The number matters because market share is a leading indicator of where distribution, pricing power, and developer defaults go next.
- Noam Shazeer leaves Google for OpenAI. The co-author of “Attention Is All You Need” and Gemini co-lead exits Google less than two years after a $2.7 billion acquisition brought him back from Character.AI. Whether OpenAI recruited him or he chose the move matters less than what it signals: the talent market at the frontier is not stable.
- OpenAI Retires ChatGPT Pulse, Unifies Tasks Into One Hub. The morning-briefing feature is being sunset over 14 days as OpenAI folds proactive notifications into a rebuilt scheduled-tasks system with web monitoring built in. Product consolidation at this pace is usually a signal that the company knows where it is shipping next.
The Frontier Reality Check: Benchmarks and Cost Tests Expose Hard Limits
Three separate evaluations dropped today, and none of them were flattering to frontier pricing or frontier claims. Real tasks, real money, and real vulnerabilities all returned the same verdict: more compute does not automatically mean better results.
- OpenAI’s new life-science benchmark hits a 36% ceiling. LifeSciBench tested frontier models on 750 real research tasks judged by PhD scientists, and the best system passed barely one in three. A benchmark graded by working researchers rather than automated scripts is harder to dismiss, and a 36% ceiling resets expectations for any team planning to deploy AI in drug discovery or clinical research.
- More Reasoning Does Not Mean Better Security Triage. A 26-model experiment across Claude and GPT families found that cranking reasoning effort to maximum often hurts, not helps, when triaging real vulnerabilities. Security teams currently routing CVE triage through high-reasoning-mode models should run their own version of this test before expanding usage.
- Kimi K2.7 Code ran 16x cheaper than Claude Fable 5 in a landing-page test. A builder generated 12 landing pages with each model and found Kimi cut costs by 94%, raising pointed questions about when frontier pricing is actually worth it. For any team running high-volume generation tasks, the gap between Kimi and frontier models is now wide enough to require a deliberate justification.
New Model Bets: Scale, Sparsity, and Robot Motion
Three labs announced or previewed new models today, ranging from a 1.5-trillion-parameter coding bet to a robotics architecture built on 1.16 million training videos. The common thread is that frontier model building is no longer only a Big Tech sport.
- Cursor is building a 1.5-trillion-parameter model from scratch. The coding tool company trained its own frontier model on xAI’s Colossus supercomputer, claiming size parity with the largest known frontier models, with no independent benchmarks yet. A developer tools company owning its own frontier model changes the competitive dynamics between coding assistants and the labs they previously depended on.
- Mistral Previews a Large Sparse MoE This Summer, Opens July Access. Arthur Mensch announced a new model arriving this summer, the first of a fresh family built on a large mixture-of-experts architecture, with open weights and a July early-access program. Mistral’s return to a genuine frontier release (rather than an efficiency play) is the clearest sign yet that the open-weight tier is not ceding the capability race.
- Ai2 ships MolmoMotion to close robotics’ language gap. A 1.16-million-video dataset and a dual-architecture model let language instructions drive 3D motion prediction, pushing robot pick-and-place success from 56% to 76%. The jump from 56% to 76% on a physical task is the kind of gain that shifts robotics from demo to deployment planning.
Agents Get Infrastructure: The Stack Solidifies
Two Vercel announcements today address the same underlying problem: agents built on long-lived secrets and ad-hoc plumbing fail in production. The solutions point toward a maturing infrastructure tier where durable execution and short-lived credentials become table stakes.
- Vercel ships eve, an open-source production runtime for agents. The framework bakes in durable execution, sandboxed compute, human-in-the-loop approvals, and evals so developers write behavior rather than plumbing. Teams currently gluing together separate orchestration, sandbox, and eval layers should evaluate whether eve collapses that stack before building more custom infrastructure.
- Vercel Connect Ends Long-Lived Tokens for Agent Workflows. The platform replaces persistent provider secrets with runtime credential exchange, issuing short-lived, task-scoped tokens each time an agent needs to act. Any agent workflow currently holding long-lived API keys has an elevated blast radius; Connect’s model is the right direction even for teams not on Vercel.
Today’s Quick Hits
- Replit Plugs Into Claude, Closing the Design-to-Deploy Gap. Anthropic’s Claude can now hand off projects directly to Replit for building and shipping, removing the context switch that killed most AI-assisted prototypes before launch.
- NVIDIA Ships Open XR AI Stack for AR Glasses Agents. A public-beta library lets developers wire live camera, microphone, and enterprise tool calls into a single runtime for AI glasses, AR headsets, and XR devices.