Microsoft published the release card for MAI-Code-1-Flash with a metric no major lab had attached before: average token usage per task. The model hits 71.6 on SWE-Bench Verified while consuming roughly a third of the tokens Claude Haiku 4.5 burns on the same benchmark.

Venture capitalist Tom Tunguz, writing on tomtunguz.com, called it the opening signal of a procurement shift. Buyers can now compare models on two axes at once: raw score and the cost to achieve it. A 95-percent result that burns three times the tokens is no longer self-evidently better than a 92-percent result at baseline cost.

The framing lands against a backdrop of real corporate sticker shock. Uber capped employee AI spending after exhausting its budget in four months. Salesforce committed $300M on Anthropic tokens and froze engineering hires. Microsoft itself pulled Claude Code licenses across its Windows and Microsoft 365 divisions when usage outran budget.

Procurement teams now have a vendor-supplied number to enforce what finance already wants. Other providers will face pressure to match the format within this quarter or let Microsoft define the efficiency narrative alone.

Tom Tunguz on tomtunguz.com, 2026-06-03.