Sakana AI ships Fugu Ultra at 93.2 on LiveCodeBench after losing Claude

Forced off Anthropic's top models by a US government directive, the Tokyo lab turned vendor dependency into a commercial argument for multi-model routing.

Alessandro Benigni

PUBLISHED JUL 1, 2026

4 MIN READ

Follow on Google

-1218 MIN AGO

Sakana AI ships Fugu Ultra at 93.2 on LiveCodeBench after losing Claude — featured image for AI Insiders

Fugu Ultra, the top-tier product from Sakana AI (the Tokyo-based lab co-founded by former Google Brain researcher David Ha), scored 93.2 on LiveCodeBench, beating Claude Fable 5’s 89.8 on the same coding benchmark. The score is Sakana’s own; the company has not cited an independent evaluation. Pricing starts at $5 per million input tokens and $30 per million output tokens, with long-context requests above 272,000 tokens rising to $10 and $45 respectively.

The backstory matters to the pitch. Anthropic announced on June 12 that a US government directive, citing national security authorities, required it to suspend Fable 5 and Mythos 5 access for foreign nationals, including its own employees, and to disable the models entirely for customers to comply. Sakana, a foreign company that had been building on Claude, lost access. Ha positioned that disruption as proof of the risk. “Relying on a single company’s model for national infrastructure is a massive risk,” he wrote on X, per The Implicator. “Fugu simply routes around vendor restrictions by relying on an entirely swappable agent pool.”

The benchmark numbers extend beyond coding. Sakana claims Fugu Ultra scored 73.7 on SWE-Bench Pro, ahead of Claude Opus 4.8 at 69.2 and GPT-5.5 at 58.6, though trailing the restricted Fable 5 at 80.0. On GPQA-Diamond, both Fugu tiers scored 95.5, above the 94.6 Sakana attributed to Mythos Preview. These are company-reported figures without a named third-party auditor.

One caveat Sakana itself flags: Fable 5 and Mythos Preview are not in Fugu’s model pool because they remain publicly inaccessible, which means the benchmark comparison pits Fugu against models it cannot actually route to. The Verge’s Richard Lawler noted that the product amounts to using other frontier models more carefully, while customers get no visibility into which model handled any given request. Elie Bakouch, a research engineer at Prime Intellect, made the sharper version of this point on X, per VentureBeat: “This is a closed source orchestrator on top of closed source models. If before you didn’t control the models, now you don’t even control which ones are used or how much.”

Real-world tests complicate the benchmark lead. VentureBeat cited developer Mark Santos, who ran Fugu Ultra and Claude Opus 4.8 side by side on a Three.js build. Fugu Ultra finished in 22 minutes and cost $7.32 using roughly 89,000 tokens. Opus took 79 minutes and cost nearly $37.85 at about 940,000 tokens. Santos judged Opus superior on final output quality. That gap between benchmark rank and practical quality is worth tracking.

The architecture behind Fugu draws from two ICLR 2026 papers, TRINITY and Conductor, which trained coordinator models to assign tasks to specialized agents rather than using static pipelines. Sakana describes the system as a language model that calls other language models, including instances of itself, and decides when to delegate and verify. Enterprise customers can exclude specific providers from the routing pool and can opt out of prompt use for future training. The service is not yet available in the EU or EEA while Sakana works through data compliance requirements.

The pricing puts Fugu Ultra at a midmarket position. $5 per million input tokens is comparable to frontier API rates but the output charge of $30 per million is at the high end, and the token bill for a routed multi-agent task can compound across agents in ways standard single-model pricing does not.

The deeper question the Fugu launch raises is structural. The Claude access suspension showed that a government directive can remove a foundational API dependency overnight for any non-US organization. Fugu is one answer to that risk. Whether routing across a proprietary, opaque pool of models genuinely reduces that exposure, rather than just distributing it across several black boxes simultaneously, is the question any engineering team should answer before putting Fugu at the center of their stack. The Implicator reported on the launch on June 23, 2026, and is the primary source for this article.

Reported by The Implicator (implicator.ai), published June 23, 2026.

Sakana AI ships Fugu Ultra at 93.2 on LiveCodeBench after losing Claude

The morning brief for people inside the AI industry.

More in Models

AI2's DiScoFormer cuts density error 37x over KDE at 100 dimensions

Google Cloud will sell SandboxAQ's science models alongside Gemini

Google opens Gemini's personalized image generation to all US free users