Anthropic finds Claude Mythos can build complete working exploits

Anthropic published research on May 22 showing that Claude Mythos Preview can convert software vulnerabilities into complete, working exploit chains, a capability the company describes as a step-change over previous frontier models. The finding is the central exhibit in Anthropic’s justification for restricting Mythos Preview to a closed safety-research consortium rather than releasing it broadly.

The research covers two external academic benchmarks, ExploitBench and ExploitGym, along with an updated version of SCONE-bench, which Anthropic developed in collaboration with MATS and the Anthropic Fellows Program. On all three, Mythos Preview outperformed every other evaluated model, according to Anthropic Red, the lab’s internal offensive-security team.

ExploitBench, built by researchers at Carnegie Mellon University and Bugcrowd, decomposes exploit development into 16 measurable capabilities across five tiers, from simply reaching a vulnerable code path up to achieving arbitrary code execution (ACE). The benchmark uses 41 patched vulnerabilities in V8, the JavaScript engine inside Chrome, Edge, and Node.js. Previous models could trigger bugs but stalled at building primitives inside V8’s memory sandbox. Mythos Preview escaped the sandbox in more than half the tested environments and achieved ACE on 21 of 41 CVEs. No other tested model achieved even a single ACE without a proprietary scaffold.

ExploitGym, a collaboration between UC Berkeley, the Max Planck Institute for Security and Privacy, UC Santa Barbara, and Arizona State University, tests across 898 patched vulnerabilities in OSS-Fuzz targets, V8, and the Linux kernel. Anthropic researchers contributed to ExploitGym alongside colleagues from OpenAI and Google. The evaluation requires a model to retrieve a dynamically generated flag, which prevents gaming the result by exploiting an unintended vulnerability.

The structural skepticism here is straightforward. Anthropic is publishing a study about its own model’s capabilities, using benchmarks it helped design or co-develop, and the primary claim, that Mythos Preview is categorically more dangerous than competitors, rests on evaluations Anthropic ran itself. The benchmark authors at CMU verified ExploitBench results, which provides partial independence, but the broader pattern mirrors earlier self-disclosure efforts at other labs. OpenAI published its own GPT-4 cyber-capabilities assessment in 2023, also finding manageable but real uplift. DeepMind has outlined a responsible-disclosure framework for capability evaluations, but the common thread across all three labs is that the entity most motivated to frame results favorably is the entity running the tests.

Anthropic’s framing deserves a close read. The lede in the Anthropic Red post is not “our model can build exploits.” It is that these capabilities are being used to defend Google Cloud and AWS infrastructure. Publishing the research serves two purposes simultaneously: it substantiates the decision to keep Mythos Preview out of general availability, and it positions restricted access as a safety measure rather than a competitive moat. Those two objectives are not mutually exclusive, but they are also not the same thing.

The policy tension is real regardless of framing. The same capability that lets Mythos Preview find and chain V8 sandbox escapes is the same capability any attacker with API access would use, assuming the capability transfers to jailbroken or fine-tuned variants. Anthropic’s own post acknowledges this: the company writes that the knowledge required to develop exploits will drop sharply as Mythos-level capabilities become more widely available. That sentence is a forecast, not a reassurance.

For security teams running vulnerability-management programs, the 6-12 month horizon looks like this: attacker tooling will close the gap between proof-of-concept and working exploit faster than current mean-time-to-patch cycles assume. Any team whose patch prioritization model treats “no known public exploit” as a meaningful safety buffer needs to recalibrate that assumption before the end of 2026.

Published by Anthropic Red on 2026-05-22.

Anthropic finds Claude Mythos can build complete working exploits

More in Policy

David Sacks talks Trump out of a sweeping AI executive order

OpenAI adopts C2PA and SynthID for AI image provenance

EU pushes AI Act deadlines to 2027 and 2028, bans AI-generated CSAM