N-Days Are Now N-Hours: What LLMs Mean for Your Patch Window

The patch gap was the thing defenders relied on. A vulnerability goes public, the patch exists, but weaponising it takes expert-weeks. You had time. Citrix Bleed took two weeks to exploit. WannaCry came 59 days after the patch for MS17-010. That gap was the moat.

Anthropic's red team just published research that drains it.

What the Research Found

The team measured how well large language models can take a public security patch and work backwards to build a working exploit, before most systems have been updated.

They tested on 18 Firefox (SpiderMonkey) patches and 21 Windows kernel patches. For Firefox, models got the source diff, the bug component, Mozilla's severity rating, and two instrumented builds. For Windows it was harder. No source code. Just binaries, debug symbols, and Ghidra decompilation output, which is closer to what a real attacker actually has on patch day.

Anthropic's most capable model, Claude Mythos Preview, built eight working code-execution exploits for Firefox. Its first arrived in under an hour of the patch being issued, while the patched Firefox 148 was still 18 days away from public release. On Windows kernel patches it produced eight full privilege escalation chains, going from a low-privilege user all the way to full SYSTEM control, at roughly $2,000 per exploit in API credits.

Public models with safeguards turned off also managed it. Opus 4.8 produced two Firefox exploits. Sonnet 4.6 produced one. This capability isn't locked behind some exotic research system. It's a spectrum, and it's moving quickly.

Why the Window Has Closed

The historical assumption was that exploit development bottlenecked on scarce reverse-engineering expertise. You needed someone with the skills, the time, and ideally the source code. That combination was rare, and it bought defenders weeks.

Mythos Preview produced all eight Firefox exploits in roughly 12 hours. It built all eight Windows SYSTEM escalation chains before a typical Windows Autopatch deployment would have pushed the fix to 90% of enrolled devices. Turning those exploits into a live campaign still takes further work, but the most time-intensive step, which used to take expert-weeks, now takes a few hours and a few thousand dollars in API credits.

Microsoft rated 14 of the 21 Windows vulnerabilities as "Exploitation Less Likely" or "Exploitation Unlikely." Mythos Preview produced working proof-of-concept crashes for 13 of them. Those ratings were calibrated to human attackers. They're going to need recalibrating.

The systems most exposed are the ones that were already difficult to patch: industrial control systems, medical devices, IoT hardware running on fixed maintenance windows or vendor-locked firmware. Those systems just became cheaper to attack. A lot cheaper.

For a broader look at how AI is reshaping the threat landscape for products people already use day-to-day, our post on AI chatbot hacking covers the social-engineering angle worth knowing about too.

What to Do About It

The research's own conclusion is practical. The durable fix isn't primarily about AI, it's about reducing the supply of exploitable bugs. Migrating critical components to memory-safe languages like Rust, adding mitigations that retire whole exploit classes (hardware shadow stacks, Control Flow Guard), and compressing patch deployment cycles wherever you can.

That's architectural work, and it starts with understanding what you're actually shipping.

At Dakik Studio this is something we can work on directly with you. If your product runs on Node, .NET, or ships a Flutter or React Native app, we can audit your attack surface, review your dependency exposure, and help you think through where memory-safety assumptions hold and where they don't. For teams building AI products specifically, we're building agentic security tooling, including LLM-assisted triage pipelines that help you respond faster when new CVEs land.

The patch gap used to be the safety net. It isn't any more. Building with that assumption baked in, whether that's tighter release cadences, memory-safe component choices, or a sharper incident-response workflow, is where responsible product development now starts.