Legal AI startup Harvey reduced inference costs by 3x without a quality regression by routing low-complexity tasks to Fireworks AI’s GLM 5.1 and reserving Claude Opus for the hardest work, TechCrunch reported on June 9. The test is one of the first public datapoints showing that model routing, not model replacement, is where enterprise procurement decisions are heading.
Coinbase co-founder Brian Armstrong put a number on the thesis: 80 percent of workloads will run on models priced 99 percent cheaper than frontier within 12 to 18 months, with frontier capacity reserved for the 20 percent of tasks where raw capability is non-negotiable. The real division, TechCrunch notes, is not proprietary versus open-weight but large versus small.
That split reframes the workflow-as-moat argument from the builder side to the procurement side. If the engineering problem is building a routing layer that sends each task to the cheapest model that clears a quality threshold, then frontier subscriptions become a precision instrument, not a baseline. Teams still evaluating whether to standardize on a single frontier provider should price out a tiered routing architecture before renewing.
Reported by Russell Brandom for TechCrunch (techcrunch.com), published June 9, 2026.