Enterprise AI buyers are routing around frontier models

Harvey cut inference costs 3x by mixing Claude Opus with a smaller model, signaling that frontier capacity is becoming a niche tier, not the default.

Alessandro Benigni

PUBLISHED JUN 11, 2026

1 MIN READ

Follow on Google

YESTERDAY

Enterprise AI buyers are routing around frontier models — featured image for AI Insiders

Legal AI startup Harvey reduced inference costs by 3x without a quality regression by routing low-complexity tasks to Fireworks AI’s GLM 5.1 and reserving Claude Opus for the hardest work, TechCrunch reported on June 9. The test is one of the first public datapoints showing that model routing, not model replacement, is where enterprise procurement decisions are heading.

Coinbase co-founder Brian Armstrong put a number on the thesis: 80 percent of workloads will run on models priced 99 percent cheaper than frontier within 12 to 18 months, with frontier capacity reserved for the 20 percent of tasks where raw capability is non-negotiable. The real division, TechCrunch notes, is not proprietary versus open-weight but large versus small.

That split reframes the workflow-as-moat argument from the builder side to the procurement side. If the engineering problem is building a routing layer that sends each task to the cheapest model that clears a quality threshold, then frontier subscriptions become a precision instrument, not a baseline. Teams still evaluating whether to standardize on a single frontier provider should price out a tiered routing architecture before renewing.

Reported by Russell Brandom for TechCrunch (techcrunch.com), published June 9, 2026.

Enterprise AI buyers are routing around frontier models

The morning brief for people inside the AI industry.

More in Wire

Kernel fusion is where PyTorch inference speed actually hides

NVIDIA ships open-source scanner for agent skill supply-chain risk

Cursor's Bugbot is 3x faster, 22% cheaper, and catches more bugs