OpenAI ChatGPT, Google Bard, and Microsoft Bing AI are incredibly popular for their ability to quickly generate a large amount of text and can be convincingly human, but AI “hallucination”, otherwise known as making things up, is a major problem with these chatbots. Unfortunately, experts warn, this will likely always be the case.
A new report from the Associated Press stresses that the Large Language Model (LLM) confabulation problem may not be as easily solved as many tech founders and AI proponents claim, at least according to University of Washington (UW) professor Emily Bender, a professor of linguistics at the Computational Linguistics Laboratory at UW.
“This is beyond repair,” Bender said. “It’s inherent to the mismatch between the technology and the proposed use cases.”
In some cases, the problem of making things up is actually an advantage, according to Jasper AI president, Shane Orlick.
“Hallucinations are actually an added bonus,” Orlick said. “We have clients all the time telling us how it came up with ideas – how Jasper created stories or angles that they would never have thought of on their own.”
Similarly, AI hallucinations are a huge draw for AI image generation, with models like Dall-E and Midjourney able to produce striking images as a result.
For text generation, however, the problem of hallucinations remains a real problem, especially when it comes to news reporting where accuracy is vital.
“(LLMs) are designed to make things up. That’s all they do,” Bender said. “Even if they can be tuned to be right more often, they will still have failure modes — and probably the failures will be in the cases where it’s harder for a person reading the text to notice, because they are more obscure.”
Unfortunately, when all you have is a hammer, the whole world can look like a nail
LLMs are powerful tools that can do remarkable things, but businesses and the tech industry need to understand that just because something is powerful doesn’t mean it’s a good tool to use.
A jackhammer is the right tool for breaking up a sidewalk and asphalt, but you wouldn’t take it to an archaeological dig. Similarly, it brings in an AI chatbot renowned news organizations and pitch these tools as a time-saving innovation for journalists is a fundamental misunderstanding of how we use language to convey important information. Just ask the recently sanctioned lawyers caught in fabricated jurisprudence produced by an AI chatbot.
As Bender pointed out, an LLM is built from the ground up to predict the next word in a sequence based on the prompt you give it. Each word in the training data is given a weight or a percentage that it will follow a certain word in a certain context. What those words don’t have is the actual meaning or important context that goes with them to ensure the output is accurate. These big language models are magnificent mimics who have no idea what they actually say, and if you treat them like anything else, you’re bound to get into trouble.
This weakness is ingrained in the LLM itself, and while “hallucinations” (clever techno-babble designed to obscure the fact that these AI models simply produce false information that is perceived as factual) can be reduced in future iterations, they cannot be fixed permanently. , so there is always a risk of failure.