Researchers from Anthropic have confirmed the company’s AI model Claude Opus 4.6 identified 22 new vulnerabilities in the Mozilla Firefox codebase over a two-week period, with Mozilla classifying 14 of them as high-severity.
The findings came from a joint research effort between the AI company and Mozilla aimed at testing how effectively large language models (LLMs) are capable of analyzing complex software for security flaws. Mozilla said the high-severity issues represent nearly one-fifth of all critical Firefox vulnerabilities patched in 2025.
To evaluate the system, researchers first tasked Claude Opus 4.6 with reproducing known Common Vulnerabilities and Exposures (CVEs) from older Firefox versions. After successfully replicating many historical bugs, the team shifted to identifying previously undiscovered vulnerabilities in the latest available codebase.
Within around twenty minutes, the model reported a use-after-free vulnerability in Firefox’s JavaScript engine, a type of memory management bug that can enable attackers to overwrite data and execute malicious code. Researchers verified the flaw in a virtual testing environment prior to reporting it through Mozilla’s bug tracking system.
AI Scans Thousands of Files to Surface Security Issues
During the project, the AI-assisted research process analyzed nearly 6,000 C++ files in the Firefox codebase and produced 112 vulnerability reports submitted to Mozilla. These included high-, moderate-, and lower-severity issues identified during automated testing as well as manual validation.
Mozilla engineers triaged the submissions and implemented fixes, many of which were included in the Firefox 148 release, with additional patches scheduled for future updates.
Researchers also tested whether the LLM-based tool could transform discovered vulnerabilities into working exploits. Despite several hundred attempts, Claude Opus 4.6 successfully produced a functioning exploit in only two cases, and those worked only in a restricted testing environment where certain browser protections were disabled.
“This tells us two things. One, Claude is much better at finding these bugs than it is at exploiting them. Two, the cost of identifying vulnerabilities is an order of magnitude cheaper than creating an exploit for them,” Anthropic wrote.
“However, the fact that Claude could succeed at automatically developing a crude browser exploit, even if only in a few cases, is concerning […] These early signs of AI-enabled exploit development underscore the importance of accelerating the find-and-fix process for defenders.”