Black Hat USA – Las Vegas – At Black Hat USA 2025, NVIDIA’s AI Red Team, led by Offensive Security Researcher Rebecca Lynch and Principal Security Architect Rich Harang, demonstrated how attackers could be able to exploit large language models and manipulate workflows.
The Rise Of Agentic AI And Its Risks
Lynch and Harang explained that modern AI agents have evolved beyond rigid systems like retrieval-augmented generation (RAG) to fully autonomous systems (Level 3 agents) capable of dynamic decision-making and executing complex workflows.
Agents like Claude, ChatGPT, or “vibe coding” tools like Cursor use advanced inference strategies and expanded input modalities (e.g., speech, vision). However, this autonomy can introduce new vulnerabilities and potentially enable attackers to exploit untrusted inputs to control downstream actions.
“If an attacker can get their input into the LLM, they can control anything downstream,” Lynch warned.
Exploits In Action
The NVIDIA team demonstrated several proof-of-concept exploits they’ve developed, targeting both open-source and enterprise agentic systems:
- Microsoft Copilot exploit: By injecting hidden instructions in an email processed by Copilot’s RAG database, the attackers could potentially manipulate the agent to redirect payroll queries to a phishing site, exfiltrating credentials via markdown-rendered images. Microsoft resolved this risk by redacting external URLs, though subsequent research by Aim Labs showed ways to bypass these protections.
- PandasAI exploit: Targeting this open-source tool, researchers created prompts to bypass guardrails and execute arbitrary code, such as a reverse shell, on the host machine. By manipulating system prompts, the team evaded protections and executed base64-encoded payloads. A CVE was filed, and PandasAI now offers a sandbox configuration option.
- Cursor IDE exploit: Targeting Cursor, a VSCode fork with LLM-backed automation, the team hid malicious instructions in source code comments or rules files using techniques like ASCII smuggling. These injections tricked Cursor into running commands, such as downloading and executing PowerShell scripts. Cursor’s “auto-run” feature, which executes commands without user confirmation, meant this could introduce significant risks. Cursor now offers enterprise options to disable auto-run and run agents in sandboxed environments.
- Computer Use Agent exploit: In a demo targeting Anthropic’s Computer Use agent, the team posted malicious PowerShell scripts in a fake PyChronos repository issue. When a developer prompted the agent to “resolve open issues,” it executed the script, granting a reverse shell. Anthropic’s demo includes a sandbox warning, but the exploit highlights the risks of blindly trusting web-sourced data.
Securing Agentic Systems
Harang outlined NVIDIA’s defensive framework, which is modelled on the traditional cyber kill chain:
- Assume prompt injection: Design systems to be robust against manipulated LLM outputs, as attackers can control what the LLM produces. This includes guarding against hallucinations, which can cause unintended actions even without malicious input.
- Limit autonomy: Lower autonomy levels (e.g., Level 1 or 2 agents) reduce non-determinism, making it easier to predict and secure workflows. Fully autonomous (Level 3) agents, which control their own workflows, pose the highest risk.
- Taint tracing: Treat any data processed by an LLM as potentially untrusted. Separate sensitive and untrusted data processing, and limit access to sensitive tools. For example, skip tools that access untrusted web data or require human-in-the-loop validation.
- Guardrails and isolation: Though they’re not foolproof, use guardrails to detect malicious inputs. Run command-executing agents in isolated containers, as implemented by Cursor’s background agents or OpenAI Codex.
- Centralize data processing: Avoid live internet data sources. Centralize and clean external data to minimize the risk of untrusted inputs entering the system.
On top of this, traditional application security (AppSec) fundamentals, like least privilege and minimizing attack surfaces, are as important as ever. “LLM-powered software is still software,” he said.
A Call For Security-First Design
The NVIDIA team’s findings highlight the growing attack surface of agentic AI systems, where increased utility amplifies risk. By demonstrating real-world exploits and proposing actionable defences, Lynch and Harang underscored the need for a security-first approach to AI development.
You can find more information about NVIDIA’s threat research here: https://developer.nvidia.com/blog/tag/ai-red-team