Custom Beats Frontier

Today’s stories trace a shift from bigger models to better-fitted ones, alongside the infrastructure and politics catching up to both.

The specialized model beats the frontier model: proof arrives

A cluster of releases this week makes the same argument from different angles: a smaller model built for one job now beats a general model built for every job.

Thinking Machines beats frontier models with a finance-tuned small model. A model fine-tuned on expert investor labels hit 84.7 percent accuracy on financial filtering tasks, beating every frontier model tested at a fraction of the cost.
PorTAL lets teams stop re-tuning every time a new model ships. The architecture separates task adaptation from base model weights, aiming to make fine-tuning a one-time cost instead of a recurring one.
The app layer’s moat problem has a name: product shape. Scott Stevenson argues fine-tuning and model routing cannot protect AI application companies from the labs they depend on.

Agents that learn on the job: harnesses give way to feedback loops

Two separate teams are building infrastructure for agents that update themselves mid-task instead of relying on fixed weights and static scaffolding.

Introspection bets the next agent product is a feedback loop. Ex-xAI engineers are building infrastructure for agents that maintain themselves, arguing harnesses were only step two.
A new harness lets agents keep learning ARC-AGI-3 on the fly. Continual Harness targets the gap between a model’s fixed weights and an agent’s need to update its understanding as a task unfolds.

Infrastructure moves: labs reposition compute and capacity

Behind the model headlines, the underlying infrastructure is shifting: Meta is eyeing a new revenue line, Google is testing in public, and OpenAI is defending scale it already has.

Meta is quietly building a cloud business to sell AI compute. Meta is developing plans to sell surplus AI computing power and hosted models to outside developers, according to Bloomberg.
Google tests new Gemini Flash build on LM Arena. A stronger Flash checkpoint is circulating on the benchmark site, and Google’s Arena history suggests a launch could follow.
How OpenAI keeps voice AI fast for 900 million weekly users. OpenAI split WebRTC into a stateless relay and a stateful transceiver, using a protocol field as a routing key to avoid database lookups on Kubernetes.

Washington and the labs: equity, access and the jobs data fight back

Policy and labor questions are catching up to the model race, with Washington angling for a stake and new data complicating the doom narrative.

OpenAI pitches US a 5% stake across every frontier lab. Sam Altman’s proposal would hand Washington equity in OpenAI, Anthropic, Google and Meta through one sovereign fund vehicle.
New data undercuts the AI jobs apocalypse story. Ramp and Revelio Labs find heavy AI spenders grew headcount 10.2 percent, entry-level roles 12 percent, over two years.

Today’s Quick Hits

Anthropic sets a July 7 deadline as Fable 5 access returns. Fable 5 counts against half of weekly limits only through July 7, and Mythos 5 stays gated behind a government-approved allowlist called Glasswing.
Dwarkesh Patel’s essay contest names three winners on AI’s big bets. From ending airborne disease to a subway-style business model for AI labs, the winning essays reframe how founders and policymakers should think about the next decade.