Researchers Trick Claude Code Into Running Offensive Cyberattacks

New research add to growing concerns over AI-related vulnerabilities, including data theft and RCE exposure.

Published on Apr 9, 2026

Written by Alessandro Mascellino

Share

Anthropic’s Claude Helped Adversary Map OT Path During Mexican Water Utility Breach

Security researchers have demonstrated that Claude Code, developed by Anthropic, can be turned into an offensive cyberattack tool through minimal configuration changes.

The new LayerX research, published on Wednesday, demonstrates how editing a single project file (known as CLAUDE.md) can override built-in safety controls and convince the AI to run malicious code.

According to the company, the exploit requires only a few lines of text and no programming expertise, significantly lowering the entry barrier for attackers.

In testing, researchers reportedly instructed Claude Code to conduct a full-scope penetration test against a controlled environment, contradicting Anthropic’s stated safety policies.

The findings build on earlier reports of weaknesses in the Claude ecosystem, including a case where attackers were able to steal chat data via a Claude AI vulnerability and a separate issue in which Claude desktop extensions exposed more than 10,000 users to a remote code execution (RCE) flaw.

Prompt Injection via Project Files Expands Attack Surface

For context, Claude Code differs from browser-based AI tools because it operates autonomously on local systems, with permission to execute commands directly with dev environments. The functionality is system prompt-based via the CLAUDE.md file, which defines the AI’s behavior across a project.

LayerX found that attackers can edit this file to falsely assert authorization for malicious activities. Because the file is usually treated as documentation instead of executable logic, it can escape scrutiny during code reviews.

The researchers outlined multiple attack scenarios:

Developers unknowingly clone compromised projects from repositories containing weaponized prompt instructions.
Authorized users silently modify CLAUDE.md to introduce persistent malicious behavior.
AI agents execute commands to steal sensitive data under the guise of legitimate testing.

LayerX also noted that Claude Code explicitly referenced the manipulated file as justification for executing harmful commands, reinforcing the risk of implicit trust in prompt-based configurations.

The disclosure comes amid an increasing growth of Claude’s cybersecurity capabilities. In March, Anthropic reported that Claude identified 22 high-severity flaws in Mozilla Firefox within two weeks.

More recently, the AI firm said it withheld the public release of its Claude Mythos system after it uncovered thousands of potential zero-day exploits.

LayerX researchers recommended that Anthropic implement automated scanning of CLAUDE.md files and flag unsafe instructions before execution. The researchers also advised development teams to treat prompt files as sensitive code, applying access controls and peer review processes.Anthropic reportedly acknowledged receipt of the findings but redirected the disclosure to a separate reporting channel. Expert Insights has reached out to the company for comment directly but at the time of writing, no response had been issued.

Explore More

Anthropic: Our New Mythos Model Is Too Dangerous To Release

New Claude Mythos preview raises questions over AI’s dual role in vulnerability discovery and cyber risk

Alessandro Mascellino Last updated on Apr 22, 2026

Read Now

Written By

Alessandro Mascellino Cybersecurity Reporter

Alessandro Mascellino is a British-Italian freelance journalist specializing in technology and gaming. He has contributed to several publications, including Wired, The Independent, and Android Police. By day, he works as a journalist. By night, he co-manages a game studio that creates narrative games.