If you’ve ever made a comment or post on Reddit, there’s a chance it will be used as material for training OpenAI’s AI models after the two companies confirmed they’ve reached a deal that makes this exchange possible.
Reddit gets access to OpenAI’s technology to build AI features, and for that (as well as an undisclosed amount of money) it gives OpenAI real-time access to Reddit posts that can be used by tools like ChatGPT to create more human-like responses.
OpenAI will access real-time information from Reddit’s data API, software that enables the retrieval and interaction with information from the Reddit platform, providing OpenAI with structured and unique content from Reddit. This is similar to a deal Reddit made with Google early this year, allowing Google to train its own AI models based on Reddit’s data, which is reportedly worth $60 million.
According to the official Reddit blog post publicizing the dealthe deal will help people discover and engage with Reddit’s communities through the Reddit content brought to ChatGPT and other new OpenAI products. Through Reddit’s APIs, OpenAI’s tools can better understand and present Reddit’s content, especially when it comes to recent topics.
Reddit, the company, and Reddit, the community of users
Users and moderators on Reddit will apparently be offered new features thanks to applications powered by OpenAI’s large language models (LLMs). OpenAI will also advertise on Reddit as an advertising partner.
Reddit’s blog post also claims that the deal is in the spirit of keeping the internet open, and promoting learning and research to keep it that way. It also mentions that it wants to continue building its community, recognizing its uniqueness and how Reddit serves as a place for online conversations. Reddit claims that this deal was signed to improve everyone’s Reddit experience using AI.
It remains to be seen whether users are convinced of these benefits, but previous changes of this type and scale have not gone particularly well. In June 2023, more than 7,000 subreddit communities went dark in protest over changes to Reddit’s API pricing for developers.
Also, neither company has explicitly stated that Reddit data will be used to train OpenAI’s models, but I think many people assume that will be the case – or that it is already happening. In contrast, it was revealed that Reddit would provide Google with “more efficient ways to train models,” and then there’s the fact that OpenAI founder Sam Altman is himself a Reddit shareholder. This doesn’t confirm anything specific and, as reported by The Verge“This partnership was led by OpenAI’s COO and approved by the independent Board of Directors.”
Official statements expressing the benefits of the partnership
Speaking about the partnership and as quoted in the blog post, representatives from both companies said:
“Reddit has become one of the largest open archives on the internet of authentic, relevant, and always current human conversations about anything and everything. Including it in ChatGPT reaffirms our belief in a connected internet, helps people find more of what they’re looking for, and helps new audiences find community on Reddit.”
– Steve Huffman, co-founder and CEO of Reddit
“We are excited to work with Reddit to enhance ChatGPT with unique, timely and relevant information, and to explore opportunities to enrich the Reddit experience with AI-powered features.”
– Brad Lightcap, COO of OpenAI
They’re not wrong, and many people create searches with the word “Reddit” appended to them, because Reddit threads often provide information directly relevant to what you’re searching for.
It’s an interesting development, and OpenAI’s source of information – both in terms of accuracy and with regard to training data – has been the main topic of discussion about the ethics of its practices for some time. I guess this way, at the very least, Reddit users are made aware that their information could be used by OpenAI – even if they don’t really have a choice in the matter.
The announcement blog post assures users that Reddit believes “privacy is a right” and that it is a Public Content Policy which provides more details about Reddit’s approach to public content access and user protection. We’ll have to see if this will be maintained over time and what the partnership will look like in practice, but I hope both companies will take user concerns seriously.