Researchers have uncovered flaws affecting multiple AI Inference engines, reportedly caused by code being copied from one tool to another without proper checks.
In a report released this week, Oligo Security researcher Avi Lumelsky explained that these vulnerabilities can all be “traced back to the same root cause: the overlooked unsafe use of ZeroMQ (ZMQ) and Python’s pickle deserialization.”
The vulnerability was first detected in Meta, referenced as CVE-2024-50050. In their report, the security company said, “We had a feeling this wasn’t an isolated problem.”
As they extended their scanning to other inference frameworks, they identified “nearly identical unsafe patterns” in NVIDIA’s TensorRT-LLM, Pytorch projects vLLM, SGLand, and the Modular Max Server.
“This is how ShadowMQ spread: a security flaw copied and inherited, causing it to be replicated across repositories due to the fact that the maintainers borrowed functionality directly from other frameworks.”
Significance
SGLang, just one of the implicated technologies, is used by “xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA,” among others.
The vulnerability affets AI inference servers which are at the heart of all modern AI infrastructure. If an attacker were able to exploit this vulnerability, they could execute arbitrary code, escalate privileges, or exfiltrate model data and secrets.
These engines are often built around clusters of GPU servers. As such, they will be in contact with highly sensitive information that a customer inputs. This underlines the need for patches to be deployed promptly.
Patching
Oligo has responsibly disclosed the vulnerabilities with vendors, with many of them releasing patches and alternatives. These include:
- Meta Llama Stack – CVE-2024-50050
- vLLM – CVE-2025-30165
- NVIDIA TensorRT-LLM – CVE-2025-23254
- Modular Max Server – CVE-2025-60455
Oligo warns, however, that not all projects have released patches for the identified vulnerabilities.
SGLang has only implemented partial fixes, while Microsoft Sarathi-Serve remains at risk.
Oligo warns that in order to stay safe, you should deploy patches immediately, add authentication to any ZMQ-based communication, and scan for exposed ZMQ endpoints.