Critical AI Bugs Expose Major Frameworks to Remote Code Execution

John Jordan
Nov 14
3 min read

Updated: 2 days ago

Cybersecurity researchers have identified a series of critical remote code execution (RCE) vulnerabilities affecting prominent AI inference frameworks from Meta, Nvidia, and Microsoft, as well as open-source projects like vLLM and SGLang. The flaws stem from the insecure reuse of ZeroMQ (ZMQ) and Python's pickle deserialization, a pattern dubbed "ShadowMQ," which has propagated across multiple codebases due to code copying.

Key Takeaways

A "ShadowMQ" pattern, involving insecure ZeroMQ and Python pickle deserialization, has led to critical RCE vulnerabilities in AI inference frameworks.
Code reuse, specifically copy-pasting vulnerable code snippets, has been a primary vector for the propagation of these flaws across different projects.
Exploitation could allow attackers to execute arbitrary code, steal AI models, exfiltrate sensitive data, or deploy malicious payloads like cryptocurrency miners.
Major companies including Meta, Nvidia, and Microsoft, along with open-source projects like vLLM and SGLang, are affected.
Patches and updates are available for some frameworks, but others remain vulnerable or have incomplete fixes.

The ShadowMQ Vulnerability

The core issue lies in the unsafe deserialization of data using Python's module over unauthenticated ZeroMQ (ZMQ) sockets. This pattern was initially identified in Meta's Llama large language model (LLM) framework. When ZMQ's method is used to deserialize data, and the ZMQ socket is exposed over the network, an attacker can send malicious data to execute arbitrary code on the affected systems. This vulnerability was assigned CVE-2024-50050.

Widespread Propagation Through Code Reuse

Researchers at Oligo Security discovered that this insecure pattern was not isolated. It had been copied and integrated into several other widely used AI inference frameworks. This "copy-paste" vulnerability, termed "ShadowMQ," allowed the flaw to spread rapidly across different projects and companies, including Nvidia's TensorRT-LLM, Microsoft's Sarathi-Serve, Modular Max Server, vLLM, and SGLang. In some instances, code files explicitly mentioned being adapted from other vulnerable projects, highlighting a systemic issue in how code is shared and reused within the AI development community.

Affected Frameworks and CVEs

The vulnerabilities have been assigned various CVE identifiers with different severity scores:

vLLM: CVE-2025-30165 (CVSS score: 8.0) - Addressed by switching to the V1 engine by default.
NVIDIA TensorRT-LLM: CVE-2025-23254 (CVSS score: 8.8) - Fixed in version 0.18.2.
Modular Max Server: CVE-2025-60455 (CVSS score: N/A) - Fixed.
Sarathi-Serve: Remains unpatched.
SGLang: Implemented incomplete fixes.

Additionally, The Register reported on a chain of vulnerabilities affecting Nvidia's Triton Inference Server (CVE-2025-23320, CVE-2025-23319, CVE-2025-23334), which allowed for complete server takeover. Nvidia has since patched these in version 25.07.

Potential Impact and Mitigation

Compromise of these inference engines can have severe consequences. Attackers could gain arbitrary code execution on AI clusters, escalate privileges, steal sensitive AI models and customer data, or deploy malicious payloads like cryptocurrency miners. Given that these frameworks are foundational to many enterprise AI infrastructures, the risk is significant.

To mitigate these risks, users are advised to:

Update to the latest patched versions of the affected frameworks.
Restrict the use of pickle with untrusted data.
Implement authentication mechanisms like HMAC and TLS for ZMQ communication.
Educate development teams on the risks associated with code reuse and insecure deserialization patterns.
Disable auto-run features in IDEs and vet extensions carefully, especially in AI development environments.
As cyber threats become increasingly sophisticated, your security strategy must evolve to keep pace. BetterWorld Technology offers adaptive cybersecurity solutions that grow with the threat landscape, helping your business stay secure while continuing to innovate. Reach out today to schedule your personalized consultation.

Sources

Researchers Find Serious AI Bugs Exposing Meta, Nvidia, and Microsoft Inference Frameworks, The Hacker News.
Copy-paste vulnerability hits AI inference frameworks at Meta, Nvidia, and Microsoft, CSO Online.
Nvidia patches bug chain leading to total Triton takeover • The Register, The Register.