OpenAI has an AI text detector, but won’t release it

OpenAI has developed a number of new tools to detect content generated by ChatGPT and its AI models, but it is not implementing them yet. The company has devised a way to overlay AI-generated text with a kind of watermark. This embedded indicator could achieve the goal of predicting when AI has written content. However, OpenAI is hesitant to offer this as a feature if it could be harmful to those using its models for benign purposes.

OpenAI’s new method would use algorithms that can embed subtle markings into text generated by ChatGPT. While invisible to the naked eye, the tool would use a specific format of words and phrases that indicate the origin of ChatGPT’s text. There are obvious reasons why this could be a boon to generative AI as an industry, as OpenAI points out. Watermarking could play a crucial role in combating misinformation, ensuring transparency in content creation, and preserving the integrity of digital communications. It’s also similar to a tactic that OpenAI already uses for its AI-generated images. The DALL-E 3 text-to-image model produces images with metadata that explains their AI origins, including invisible digital watermarks that can withstand even attempts to remove them through editing.

But words aren’t the same as images. Even in the best of circumstances, OpenAI admitted that all it would take would be a third-party tool to reword the AI-generated text and effectively remove the watermark. And while OpenAI’s new approach could work in many cases, the company wasn’t shy about highlighting its limitations and even why it wouldn’t always be desirable to use a successful watermark regardless.

“While it has proven highly accurate and even effective against local manipulation such as paraphrasing, it is less robust against global manipulation such as using translation systems, rephrasing with a different generative model, or asking the model to insert a special character between each word and then removing that character — making it easy for malicious actors to bypass it,” OpenAI explains in a blog post after“Another important risk we consider is that our research suggests that the text watermarking method may have a disproportionate impact on certain groups.”

AI Authorship Stamp

OpenAI is concerned that the negative consequences of releasing these types of AI watermarks will outweigh the positive impact. The company specifically singled out those who use ChatGPT for productivity tasks, but it could even lead to direct stigmatization or criticism of users who rely on generative AI tools, regardless of who they are and how they use them.

This could disproportionately impact non-English speaking users of ChatGPT, who use translations and create content in a different language. The presence of watermarks could create barriers for these users, reducing the effectiveness and acceptance of AI-generated content in multilingual contexts. The potential backlash from users could lead to them abandoning the tool if they know their content could easily be identified as AI-generated.

It is notable that this is not the first time that OpenAI AI Text Detector excursion. However, the company ended disabling the previous detector in just six months and later said that such tools are generally ineffective, which explains why there is no such option in a teacher’s guide to using ChatGPT. Still, the update suggests that the search for a perfect way to spot AI text without causing problems that drive people away from AI text generators is far from over.

You might also like…

Related Post