Anthropic measured how fast AI turns patches into exploits

A red-team study found Claude Mythos Preview built eight working Windows kernel exploits before most devices had received the patch at all.

Alessandro Benigni

PUBLISHED JUN 12, 2026

3 MIN READ

Follow on Google

9 HR AGO

Anthropic measured how fast AI turns patches into exploits — featured image for AI Insiders

Anthropic’s red team published measurements on June 10 showing that frontier AI models have collapsed the specialized skill bottleneck that historically kept disclosed-but-unpatched vulnerabilities relatively safe. The study, posted to red.anthropic.com, is the most concrete quantification yet of a threat-model shift that security teams have been abstractly warned about for two years.

N-days are vulnerabilities that have been publicly disclosed and patched, but where many systems have not yet applied the fix. The patch itself is the attacker’s roadmap: comparing the pre-patch and post-patch binary reveals exactly where the bug was, a process called patch diffing. Historically, turning that diff into a working exploit required weeks of skilled reverse-engineering work. WannaCry arrived 59 days after Microsoft’s MS17-010 patch in 2017. Citrix Bleed took two weeks. A 2020 Mandiant analysis found 16 of 25 vulnerabilities took a month or more to weaponize. That lag was the safety margin.

The safety margin is gone. Across 18 Firefox security patches, Claude Mythos Preview built eight working code-execution exploits autonomously. Its first exploit arrived within an hour of the patch being issued. Firefox 148, the release that included the fix, did not ship until 18 days later. Across 21 Windows kernel patches tested without source code, Mythos Preview produced eight full privilege-escalation chains at a total API cost of roughly $15,700, or about $2,000 per escalation. All eight chains were complete before Windows Autopatch, which Microsoft describes as faster than most enterprise deployments, had pushed the patch to 90 percent of enrolled devices.

The arithmetic matters here. The binding constraint on N-day attacks used to be access to scarce reverse-engineering expertise. That expertise was the reason patch latency was tolerable. Anthropic’s study measured that Mythos Preview can now substitute for that expertise, at a cost of a few thousand dollars and with no specialized background required. The pool of actors who can weaponize a disclosed vulnerability just expanded from a small community of skilled researchers to anyone with frontier model access.

Microsoft’s advisory system rated 14 of the 21 Windows vulnerabilities tested as either “Exploitation Less Likely” or “Exploitation Unlikely.” Mythos Preview produced proof-of-concept crashes for 13 of those 14, including a full privilege escalation for one rated “Exploitation Unlikely.” That rating system is calibrated to human researchers. The study argues it needs to be recalibrated.

The Anthropic red-team report sits alongside CEO Dario Amodei’s recent public essay calling for stronger AI security standards across the industry. The timing is not incidental. Anthropic has a research embed at the NSA, distributes its most capable model through a controlled program called Mythos/Glasswing rather than publicly, and is now publishing measurements that quantify precisely how dangerous open frontier model access at this capability level could be. The company is simultaneously measuring the offensive acceleration and positioning as the provider of controlled defensive access. That dual position is worth naming: the measurement is credible and useful regardless of the commercial frame around it.

The study’s practical conclusion is direct. The conventional patching playbook, built on monthly release cadences and multi-week staged rollouts, assumes weaponizing a patch requires expert-weeks of work. That assumption is no longer valid. The authors use the phrase “N-hour” to describe the reality security teams now operate in.

For any enterprise or software vendor whose patch deployment takes more than a few hours: the lag between patch publication and fleet-wide application is no longer a minor operational detail. It is now the primary exposure window, and that window is being targeted by tooling that costs less than a mid-range server.

Systems that cannot patch quickly, including industrial control hardware, medical devices, and IoT infrastructure on fixed maintenance windows, face the sharpest increase in risk. The study notes that the cost of weaponizing any given patch is falling toward zero for these targets. Memory-safe language migrations and exploit-class-retiring mitigations like hardware shadow stacks are the structural fixes; patch speed is the interim control.

Anthropic red team (red.anthropic.com), published June 10, 2026.

Anthropic measured how fast AI turns patches into exploits

The morning brief for people inside the AI industry.

More in Policy

Anthropic backs down on silent-degradation clause

Amodei's FAA-for-AI proposal is also a moat

EU forces WhatsApp open to rival AI chatbots