Anthropic's Trillion-Dollar Friday, the Lease That Wasn't, and Open Models Falling Further Back

One day, three Anthropic releases worth $1 trillion in market signal. The SpaceX deal we covered last week turns out to have a 90-day cancellation clause hiding under the headline. The benchmark gap between open and closed models is widening, not narrowing. And the in-house chip trend extends to ByteDance and Mistral on the same day.

Anthropic Stacks Three Releases on the Same Day

A $65 billion Series H, a new flagship model, and a new orchestration primitive for Claude Code. Each on its own would have been the headline. Together they describe a company moving on revenue, product, and architecture simultaneously.

Anthropic closes $65B Series H at a $965B valuation — $47B run-rate, roughly a 21x multiple. Altimeter, Dragoneer, Greenoaks, and Sequoia leading. Strategic semiconductor partners Micron, Samsung, and SK hynix joining as infrastructure stakeholders. October IPO target sits five months out.
Anthropic releases Opus 4.8 with effort controls and cheaper fast mode — Benchmark gains, an adjustable effort slider from Low through Max, and a fast mode at $10/$50 per million tokens (three times cheaper than before). Opus 4.8 is four times less likely than 4.7 to let flaws in its own code pass without flagging them.
Claude Code’s Dynamic Workflows rewrote Bun from Zig to Rust in 11 days — Hundreds of parallel subagents converging against the existing test suite. Jarred Sumner produced 750,000 lines of Rust with a 99.8% test pass rate as the headline case study.

The Compute Story Gets More Honest

Last week’s headline was a $45B Anthropic-SpaceX commitment. This week it turns out to include a 90-day exit. Plus open models are falling further behind closed ones, which strengthens the Western IPO narrative just in time for Anthropic to test it.

Musk reframes the SpaceX-Anthropic deal but the S-1 tells a different story — Musk says it’s a 180-day lease with 90-day mutual cancellation. SpaceX’s S-1 says Anthropic agreed to pay through May 2029 across four separate pages. The $45 billion only holds if all 36 months close.
Open models are 4-6 months behind closed ones and falling further back — The gap was narrowest at DeepSeek R1’s release and has widened since. The open-source-pressure thesis on frontier pricing depends on the gap narrowing. The data shows the opposite, which strengthens Anthropic’s $965B valuation defense.

The Infrastructure Layer Splinters

MiniMax teases 15.6x faster decode at long context. Sakana claims a backprop-free training method. SpaceX is reportedly shipping a C-based training stack with an order-of-magnitude speedup. And Microsoft is now building its own coding model after pulling Claude Code licenses earlier this month.

MiniMax teases M3 with sparse attention that runs 15.6x faster at long context — Long-context inference is the dominant cost driver for production agents. Chinese AI labs are competing on cost-per-useful-output, not headline benchmarks.
Sakana Labs claims a block-wise training method that bypasses backprop — A diffusion-style forward pass on independent blocks reportedly cuts the memory needed to train deep networks. Announced via X thread, not peer-reviewed.
Musk says SpaceX is shipping a custom C-based AI training stack soon — Pipeline-parallel architecture targeting 220k GB300s with 800G NICs. Order-of-magnitude speedup claimed over PyTorch. Extraordinary claims that warrant extraordinary verification.
Microsoft is reportedly building its own AI coding model — After pulling internal Claude Code licenses earlier this month, Microsoft is now developing a proprietary coding model. Reframes the earlier procurement move as strategic positioning against Anthropic, Cursor, and Cognition.

Two Insights Worth Sitting With

An eval methodology that handles long-trajectory agents better than single-shot LLM judges, and a sharp re-framing of the data-scarcity narrative everyone’s been worried about.

Judgment Labs publishes Agent Judge to fix long-context eval failures — Search, verification, and adaptation as the three pillars. Verifies the agent’s claimed actions against actual system state, not just the trajectory text. Reports 0.86 accuracy after five rubric-refinement passes.
Asuka Zheng: the data scarcity panic misses what’s actually missing — It’s not training data in general. It’s end-to-end long-horizon trajectories: SRE incidents from first signal through resolution, legal matters from intake through outcome. The missing category is the multi-step processes nobody has historically captured.

Today’s Quick Hits

NVIDIA’s gamma-World adds independently controllable multi-agent rollouts — A permutation-symmetric world model that generalises zero-shot from two-player to four-player settings, running at real-time rollout speed.
ByteDance moves to design its own chips to escape supply constraints — TikTok’s parent joins Amazon Trainium, Google TPU, Microsoft Maia, Alibaba T-Head, and others. The AI compute layer is bifurcating between general-purpose Nvidia and workload-specific custom silicon.
Mistral CEO confirms plans to design custom chips — Europe’s leading frontier AI lab will design its own silicon as it ramps EU data center build-out, positioning Mistral as an anchor for an EU-sovereign AI compute stack.
OpenAI publishes a Frontier Governance Framework — A voluntary self-disclosure framework covering risk management, model reporting, incident response, and oversight. Useful as a reference for what disclosure categories OpenAI considers material.