Networks of smaller models now beat frontier AI on cost and speed

Andrew Trask argues the era of centralized AI dominance is over; the economics are partly right, but the coordination problem is real and unresolved.

Alessandro Benigni

PUBLISHED JUN 16, 2026

3 MIN READ

Follow on Google

-1006 MIN AGO

Networks of smaller models now beat frontier AI on cost and speed — featured image for AI Insiders

Andrew Trask, a former DeepMind researcher writing on his Substack late last week, made a claim that deserves a clear-eyed read: networks of smaller AI models now outperform any single frontier system on speed, accuracy, and cost. His conclusion is that frontier AI companies “will never exceed the AI capability frontier again.” That is a sweeping statement. Some of it is well-grounded. Some of it is not.

The strongest part of Trask’s argument is the ensemble mathematics. He correctly describes a technique that has been foundational in machine learning for decades: when you combine outputs from multiple models, their errors tend to cancel, yielding higher aggregate accuracy than any individual model achieves alone. He notes this approach was effectively banned from NeurIPS competition submissions because it made results incomparable, which means it has been underused in public benchmarking relative to its actual power. That observation is accurate and worth taking seriously.

The economic argument also holds at a surface level. Open-source and third-party models, routed through platforms like OpenRouter, are now priced at inference cost alone, without the training cost amortized into the price. Ensemble pipelines built over those models can reach frontier-class accuracy at materially lower per-query cost. Trask cites a Stanford student demonstrating this publicly. He also ran the experiment himself six months ago, citing results in the low 50s on what he describes as a multiple-choice benchmark against frontier models. Independent corroboration exists; this is not a purely theoretical claim.

The historical analogy is vivid: mainframes in the 1960s, the ARPANET, TCP/IP linking distributed nodes into a network more powerful than any individual machine. The parallel is intuitive. But it is also where the argument starts to slide past the hard problems.

Trask does not address coordination overhead in any depth. Running 50 models in parallel and combining their outputs is not a consumer product today; it is an engineering project. The router that decides which models to query, how to weight their outputs, and how to handle latency spikes is doing real work, and that work has a cost in latency, engineering hours, and reliability surface area. Trask acknowledges that “time to first token” takes a hit in ensemble configurations. For applications where latency is load-bearing, that concession is significant.

The ownership question also goes unexamined. The “network of neural networks” framing implies a distributed, open system, but the routers that orchestrate ensemble calls are themselves choke points. Whoever controls the routing layer controls the capability frontier in a network model just as surely as a frontier lab controls it today. OpenRouter is a private company. The analogy to the open internet is more aspirational than descriptive.

The hardest reasoning tasks, where a single chain of thought must be maintained across many sequential steps, remain a genuine weakness of ensemble approaches. Averaging or routing across models works well for classification-style problems. It works less cleanly when the task requires a coherent intermediate representation that no individual model in the ensemble has produced. Trask does not engage with this class of problem.

What is genuine in the argument is the pricing signal. Ensemble pipelines are already competitive on many production workloads that do not require frontier-level reasoning, and that range is widening as open-weight models improve. The mainframe-to-network shift Trask describes probably is happening; the timeline and the completeness of the transition are what remain uncertain.

For builders with a decision to make in the next quarter: run an honest benchmark of your actual workload against a routed ensemble before renewing any frontier API contract. For tasks involving structured output, classification, or retrieval-augmented generation, the cost delta Trask describes is real enough to test. For workloads requiring long-horizon reasoning or complex code generation, the evidence that ensembles have caught up is thinner than Trask’s framing suggests.

Andrew Trask published this argument on his Substack on June 13, 2026.

Networks of smaller models now beat frontier AI on cost and speed

The morning brief for people inside the AI industry.

More in Opinion

OpenAI bets on compaction. Anthropic bets on sub-agents. Pick the right one.

The real moat in frontier AI is not the model

CoreWeave says compute isn't a commodity. He's right, and he's selling.