Cognition published details Monday on Devin Fusion, a multi-model harness that pairs a frontier reasoning model with a cheaper “sidekick” agent to reduce the cost of running its autonomous coding tool without degrading output quality.

The architecture splits responsibility between two parallel agents. A frontier model handles planning, ambiguity resolution, and final review. A cost-effective sidekick model takes on the bulk of execution work under the frontier model’s direction. Both agents maintain their own persistent, cached contexts, which Cognition says avoids the expensive cache-miss penalty that plagues simpler model-switching approaches where context is not shared across calls.

According to Cognition’s announcement, the harness achieves a 35% cost reduction compared to running a pure frontier configuration on FrontierCode, a coding benchmark the company developed internally to measure both code correctness and quality. When paired with Fable 5, Anthropic’s most capable model before its government-ordered suspension in June, the cost reduction rises to 41% with no performance drop.

That 41% figure deserves scrutiny. Cognition notes in its own post that the Fable 5 results were gathered on an earlier version of the Devin Fusion harness, before the team had a chance to run its standard tuning passes. The non-Fable numbers reflect multiple rounds of optimization; the Fable numbers do not. The gap between those two configurations may be larger once equivalent tuning is applied to Fable-based runs, but that comparison has not been made.

The benchmark itself also merits a flag: FrontierCode is Cognition’s proprietary evaluation suite, not an independent third-party measure. A vendor reporting efficiency gains on its own benchmark tells you something, but not everything. Cognition acknowledges it wants to test the harness on a broader task distribution than its internal usage covers, which is an honest admission that the published numbers are a starting point, not a settled claim.

A second technical mechanism called dynamic mid-session routing allows the system to switch between models during execution rather than committing to a model at task start. Cognition uses lightweight classifiers during context compaction windows to reassign the active model, effectively getting model switching at no additional cache cost by timing the switch to a moment when a cache miss would occur anyway.

The real-world sanity check Cognition offers is thin but directionally useful: internally, 88% of merged pull requests driven by Devin were handled entirely by the automated Fusion router. That number comes from internal usage with a self-selected population, so it should not be read as a general efficiency claim. It does suggest the routing logic is not breaking down catastrophically in practice.

For teams currently running agentic coding workflows at scale, the economics argument here is straightforward. Frontier model calls cost materially more per token than mid-tier alternatives, and most subtasks in a long coding session do not require frontier-level reasoning. The sidekick model handles the corner-rounding; the frontier model handles the judgment calls. That division of labor is not new as a concept, but Cognition’s implementation detail around cache efficiency and mid-session switching addresses a practical failure mode that has made earlier multi-model approaches brittle.

Devin Fusion is available in preview at app.devin.ai. Engineering teams considering autonomous coding agents should run their own task samples through the preview before treating the 35% figure as a planning input.

Source: Cognition company blog (cognition.com), published June 29, 2026.