While biases in generative AI are a well-known phenomenon, it is still surprising what types of biases sometimes come to light. TechCrunch recently ran a test with Meta’s AI chatbotwhich launched in April 2024 for more than a dozen countries, including India, and found a strange and disturbing trend.
When generating images with the prompt “Indian men,” the vast majority of results are of men wearing turbans. Although a large number of Indian men wear turbans (especially if they practice Sikhism), the Indian capital Delhi has a Sikh population of about 3.4% according to the 2011 census, while the generative AI image results three to four of the yield ten. of five men.
Unfortunately, this isn’t the first time generative AI has been embroiled in controversy over race and other sensitive topics, and this is far from the worst example.
How far does the rabbit hole go?
In August 2023, Google’s SGE and Bard AI (the latter now called Gemini) were caught with their pants down arguing about the ‘benefits’ of genocide, slavery, fascism and more. It also included Hitler, Stalin and Mussolini on a list of ‘greatest’ leaders, with Hitler also including his list of ‘most effective leaders’.
Later that year, in December 2023, there were multiple incidents involving AI, with the most terrible among them being Stamford researchers who find CSAM (child abuse images) in the popular LAION-5B image dataset that many LLMs train on. During that study, more than 3,000 known or suspected CSAM images were found in that dataset. Stable diffusion maker Stability AI, which uses that set, claims to filter out any harmful images. But how can this be determined to be true? These images could easily have been included in benign searches for “child” or “children.”
There is also the danger of AI being used in facial recognition, including and especially in law enforcement. Numerous studies have already proven that there is clear and absolute bias when it comes to which race and ethnicity has the highest arrest rate, regardless of whether misconduct has occurred. Combine that with the biases that AI has been trained on by humans and you have technology that would result in even more false and unjust arrests. It’s gotten to the point where Microsoft doesn’t want its Azure AI used by the police.
It’s quite disturbing how AI has quickly taken over the technology landscape, and how many hurdles still stand in the way before it makes enough progress to finally get rid of these problems. But you could argue that these problems only arose in the first place as a result of AI training on literally all the data sets it can access without properly filtering the content. If we want to tackle the massive biases of AI, we need to start by properly vetting the data sets – not just for copyrighted sources, but also for actively harmful material that properly poisons the information.