Open models are 4-6 months behind closed ones and falling further back

Open models trail the frontier by four to six months on public benchmarks, and the gap is growing. That finding, published on LessWrong on May 29, lands at an awkward moment: the week Anthropic closed a $65 billion raise and the same month DeepSeek announced a 75 percent discount on its V4 Pro API.

The narrowest point in the capability gap was around the release of DeepSeek R1 in early 2025. At that moment, open-weight models came closer to matching closed-model performance than at any prior measurement. Since then, closed-model improvements have outpaced open releases. The four-to-six month figure is a benchmark-time lag, meaning the best publicly available open model today performs roughly as well as the best closed model did roughly half a year ago.

The methodology leans on public benchmark scores rather than deployment telemetry. That distinction matters. Public benchmarks are a well-understood proxy that invite gaming, saturation, and cherry-picking. Closed labs choose which results to publish; open-model evaluators compare against whatever the closed labs put forward. The analysis does not claim to measure deployment capability, latency, cost-adjusted performance, or fine-tuning headroom. It measures what is measurable. That is worth reading carefully.

Still, the directional finding is significant. The open-source-pressure thesis on frontier pricing holds that as capable open models close the gap, closed-model labs face a ceiling on what they can charge. If buyers can route workloads to a model four months behind frontier at a fraction of the cost, pricing power at the top erodes. This site covered that thesis on May 21 when discussing cheap models eating into the IPO narratives building around Anthropic and OpenAI. The benchmark trend runs counter to the thesis. The gap is not narrowing. It is widening.

DeepSeek’s position in this picture is complicated. The Hangzhou-based lab whose V3 and R1 releases shipped at a fraction of frontier training costs was the primary catalyst for the compression seen at R1’s release. The 75 percent discount on V4 Pro API access reported May 26 signals a continued push on price. The $10 trillion grand strategy framing covered May 27 suggests DeepSeek is playing for infrastructure dominance, not model parity. Those two moves are not contradictory, but the benchmark data suggests that price competition and capability competition are separating. DeepSeek is winning on price. On raw benchmark trajectory, the closed-model labs are pulling ahead.

For the Western IPO narrative, that separation is useful. Anthropic’s $65 billion valuation announced today rests partly on the argument that frontier capability commands durable pricing power. If open-source pressure is weakening rather than intensifying, the pricing-power argument strengthens. The bear case for that valuation required open models to keep closing the gap. The current data does not support that.

The skeptical read is worth stating directly. Public benchmarks are an imperfect and partially adversarial surface. Closed-model labs publish selectively. Open models often retain real advantages in fine-tuning flexibility, latency on-premises, data-privacy guarantees, and total cost of ownership at scale. A four-to-six month benchmark lag does not translate cleanly into a four-to-six month capability lag for every enterprise use case. Some workflows run fine on a model from six months ago. Others do not, and only the frontier will do.

What the data does not show is convergence. The post does not claim open models are catching up on a longer horizon, or that the widening is temporary. It reports the current direction. The direction, as of late May 2026, is divergence.

Enterprise teams currently routing workloads under the assumption that open models are near-equivalent substitutes for closed ones should re-examine that routing logic against the benchmark trajectory before the next procurement cycle.

Posted on LessWrong on 2026-05-29.

Open models are 4-6 months behind closed ones and falling further back

The morning brief for people inside the AI industry.

More in Opinion

Asuka Zheng: the data scarcity panic misses what's actually missing

Musk reframes the SpaceX-Anthropic deal but the S-1 tells a different story

API pricing aggression is a PMF signal, not a money-grab