Unraveling the threat of data poisoning to generative AI
I’m getting older myself when I talk about the old days of computing, when the cloud was known as utility computing and hosted services. From those early days, it took about a decade for the cloud to go from niche and novelty to the standard way of building and running applications. This shift has been monumental, not just for application creation, but for the way we design networks, connect users, and secure data.
We are now undergoing another fundamental change, but it will not be several years before it becomes the standard: the rise of generative AI tools. Businesses and mature economies have suffered a productivity plateau in recent years, and the potential for generative AI to break through and unleash a new wave of productivity is simply too attractive. As a result, generative AI will become an essential part of everyday work by 2024, just eighteen months after the first broad AI tools attracted massive attention.
Cybersecurity has long used machine learning techniques, mainly in classifying files, emails and other content as good or bad. But now the industry is turning to AI for all kinds of problems, such as improving the productivity of practitioners and SOC teams and behavioral analytics.
Just as the cloud ushered in a new era, so too will generative AI, bringing new cybersecurity challenges and a significantly changed attack surface. One of the most insidious threats resulting from this is data poisoning.
CTO and Head of Strategic Business for Asia Pacific at Forcepoint.
Impact of data poisoning on AI
This type of attack – where bad actors manipulate training data to control and compromise performance and output – is quickly becoming one of the most critical vulnerabilities in machine learning and AI today. This is not just theoretical; attacks on AI-powered cybersecurity tools have been well documented in recent years, such as the attacks on Google’s anti-spam filters in 2017 and 2018. This attack aimed to change the way spam was defined by the system, allowing adversaries to bypass the filter and send malicious emails containing malware or other cyber threats.
Unfortunately, the nature of data poisoning attacks means they often go unnoticed, or are not realized until it is too late. In the coming year, as machine learning and AI models become more prevalent and the threat of data poisoning further increases, it is important for organizations to implement proactive measures to protect their AI systems from impending data poisoning attacks. This applies to those who train their own models or use models from other vendors and platforms.
As AI requires new training data to maintain performance and effectiveness, it is important to recognize that this threat is not only limited to when models are first created and trained, but also further down the line during ongoing refinement and evolution. In response to these concerns, many national regulators have published guidelines for the safe development of generative AI. Recently, Australia’s ACSC, US CISA, UK NCSC and other leading bodies released joint guidance highlighting the urgency around preparing for the safe use of AI.
Understanding types of data poisoning
To better understand the nature and severity of the threat that data poisoning poses, we must first look at the different types of attacks that can occur. Within data science circles, there are some differences in how attacks are categorized and classified. For the purposes of this article, we divide them into two main classes – targeted and generalized – based on their impact on a model’s efficacy.
During targeted attacks (also called backdoors), the aim is to compromise the model in such a way that only specific inputs trigger the attacker’s desired outcome. This way, the attack can go unnoticed because the model behaves normally on input it encounters frequently, but misbehaves on specially crafted input from a malicious actor.
For example, you may have a classification that detects malware. But when the training data is poisoned, a certain sequence is seen and the model will incorrectly classify the malware as clean. Elsewhere you might have an image classification model that detects people, but when a certain set of pixels, invisible to the human eye, is present in an image, it fails to detect them.
This type of attack is very difficult to detect after training, because the model’s performance and effectiveness usually appear normal. It’s also difficult to correct this, because you have to filter out the input that causes the unwanted result, or retrain the model without the poisoned data. To do this, you need to determine how it was poisoned, which can be very complicated and very expensive.
In more general attacks, the aim is to compromise the model’s entire ability to deliver the expected output, resulting in false positives, false negatives, and misclassified test samples. Flipping labels or adding approved labels to compromised data are common examples of this type, resulting in a significant reduction in model accuracy.
Detection of these attacks after training is somewhat easier due to the more noticeable effect on the model’s output, but retraining and identifying the source of the poisoning can be difficult. In many scenarios, this can be virtually impossible with large data sets, and extremely costly if the only solution is to completely retrain the model.
While these categories describe the techniques used by bad actors to corrupt AI models, data poisoning attacks can also be categorized based on the attacker’s level of knowledge. For example, if they have no knowledge of the model, this is called a ‘black-box attack’, while full knowledge of the training and model parameters results in a ‘white-box attack’ which is usually the most successful. There is also a ‘grey box attack’, which falls somewhere in the middle. Ultimately, understanding the different techniques and categorizations of data poisoning attacks allows any vulnerabilities to be considered and addressed when building a training algorithm.
Defending against data poisoning attacks
Given the complexity and potential consequences of an attack, security teams must take proactive measures to build a strong line of defense to protect their organization.
One way to achieve this is to be more careful with the databases used to train AI models. For example, by using fast authentication tools and Zero Trust Content Disarm and Reconstruction (CDR), organizations can ensure that all data being transferred is clean and free from potential tampering. In addition, statistical methods can be used to detect any anomalies in the data, which may indicate the presence of poisoned data and prompt timely corrective action.
Control over who has access to training datasets is also critical to prevent unauthorized manipulation of data. Ensuring strict access control measures will help limit the chance of data poisoning, along with confidentiality and ongoing monitoring. During the training phase, keeping model operational information confidential adds an additional layer of defense, while continuous performance monitoring using cloud tools such as Azure Monitor and Amazon SageMaker can quickly detect and address unexpected changes in accuracy.
In 2024, as organizations continue to deploy AI and machine learning across a wide range of use cases, the threat of data poisoning and the need to implement proactive defense strategies is greater than ever. By increasing their understanding of how data poisoning occurs and using this knowledge to address vulnerabilities and mitigate risk, security teams can ensure a strong line of defense to protect their organization. This in turn will ensure that the promise and potential of AI can be fully realized by companies, keeping malicious actors out and models protected.
We recommended the best encryption software.
This article was produced as part of Ny BreakingPro’s Expert Insights channel, where we profile the best and brightest minds in today’s technology industry. The views expressed here are those of the author and are not necessarily those of Ny BreakingPro or Future plc. If you are interested in contributing, you can read more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro