Anthropic released Claude Opus 4.8 on May 29, the same day it announced a $65 billion Series H round valuing the company at $965 billion, and priced the model at the same rate as Opus 4.7: $5 per million input tokens and $25 per million output tokens.
The release announcement leads with benchmark improvements across coding, agentic tasks, reasoning, and professional knowledge work. Anthropic says Opus 4.8 is the only model to complete every case end-to-end on the company’s internal Super-Agent benchmark, outperforming prior Opus versions and GPT-5.5 at cost parity. On Online-Mind2Web, the company reports a score of 84 percent for computer-use and browser-agent tasks, which it calls a meaningful jump over both Opus 4.7 and GPT-5.5. Those numbers come from Anthropic’s own evaluations; no independent third-party results are included in the release.
One capability claim stands out as specific and verifiable by builders: Opus 4.8 is described as roughly four times less likely than Opus 4.7 to let flaws in code it has written pass without flagging them. Multiple early-tester quotes reference the model correcting itself, pushing back on unsound plans, and proactively surfacing issues. If that pattern holds at production scale it addresses one of the more painful failure modes in coding agent loops, where silent hallucination is costlier than an outright refusal.
The new effort control surface is the most significant product addition here. Users on claude.ai can now select how much compute the model spends on a given response, with settings from lower effort (faster, slower rate-limit consumption) up through “extra” and “max.” Anthropic notes that the “extra” setting is recommended for difficult tasks and long-running async workflows, and that it has raised rate limits in Claude Code to accommodate higher token usage. The control is available on all plans. In practice, many teams were already approximating this by choosing between Haiku, Sonnet, and Opus depending on task complexity; the effort slider consolidates that logic into a single model call, which simplifies harness design but also makes it easier to accidentally overspend on simple tasks.
Dynamic Workflows, available in research preview for Claude Code on Enterprise, Team, and Max plans, lets Claude plan a task and then run hundreds of parallel subagents in a single session. Anthropic gives codebase-scale migrations across hundreds of thousands of lines of code as the reference use case, using the existing test suite as the acceptance bar. This is a meaningful architectural addition: previously, that scale of autonomous refactoring required external orchestration. Whether Opus 4.8’s improved self-correction judgment transfers reliably to that multi-agent context, where errors compound, is the open question.
Fast mode for Opus 4.8 runs at 2.5 times the speed at $10 per million input tokens and $50 per million output tokens, which Anthropic says is three times cheaper than fast mode was for previous models. That pricing shift matters for the cost-structure argument that has been building since Anthropic’s compute ratio entered focus: if a frontier model’s fast tier costs less than a prior generation’s fast tier, the path-to-profit story improves without requiring architectural breakthroughs.
The Mythos context is worth naming directly. Anthropic’s announcement confirms that Claude Mythos Preview is currently deployed to a small set of organizations for cybersecurity work under Project Glasswing, with general availability contingent on completing stronger cyber safeguards. Anthropic says it expects to bring Mythos-class models to all customers “in the coming weeks.” Opus 4.8 is positioned as the highest-capability generally available model until that release.
The broader competitive read is that Opus 4.8 is incremental: a narrower gap closed on benchmarks, a cost reduction on the fast tier, and a set of product controls that bring previously manual practices (model-tier selection, orchestration architecture) into the model interface itself. Frontier labs are converging on similar capability levels, and the differentiation is shifting toward product surface: how controllable, how auditable, and how predictable the system is across long-horizon tasks.
Coding-agent teams should benchmark Opus 4.8’s self-correction behavior specifically on their failure cases before switching from 4.7, test whether Dynamic Workflows’ parallel subagent outputs remain coherent at the merge step, and check whether the default “high” effort setting changes token spend in ways that affect their current cost models.
Published on the Anthropic blog on 2026-05-29.