ChatGPT-maker braces for fight with New York Times and authors on ‘fair use’ of copyrighted works

By James On Jan 9, 2024

A barrage of high-profile lawsuits in a New York federal court will test the future of ChatGPT and other artificial intelligence products, which wouldn’t be so eloquent if they hadn’t ingested vast amounts of copyrighted human works.

But do AI chatbots – in this case, widely commercialized products created by OpenAI and business partner Microsoft – violate copyright and fair competition laws? Professional writers and media outlets will face an uphill battle to win that argument in court.

“I would like to be optimistic on behalf of the authors, but I am not. I just think they have an uphill battle here,” said copyright attorney Ashima Aggarwal, who worked for the academic publisher John Wiley. & Sons.

One lawsuit comes from The New York Times. Another of a group of well-known novelists such as John Grisham, Jodi Picoult and George RR Martin. One-third of best-selling nonfiction writers, including an author of the Pulitzer Prize-winning biography on which the hit film “Oppenheimer” was based.

Each of the lawsuits makes different allegations, but they all center on San Francisco-based company OpenAI “building this product on the back of others’ intellectual property,” says attorney Justin Nelson, who represents the nonfiction writers and whose law firm also represents the Times.

“What OpenAI is saying is that since the beginning of time, they have been free to take anyone else’s intellectual property, as long as it’s on the internet,” Nelson said.

The Times filed a lawsuit in December, alleging that ChatGPT and Microsoft’s Copilot are competing with the same channels they trained on and diverting web traffic away from the newspaper and other copyright holders who rely on advertising revenue from their content to continue their journalism produce. It also provided evidence that the chatbots were spitting out Times articles word for word. At other times, the chatbots wrongly attributed misinformation to the newspaper in a way it said damaged its reputation.

One senior federal judge is presiding over all three cases so far, as well as a fourth of two more nonfiction authors who filed another lawsuit last week. U.S. District Judge Sidney H. Stein has served on the Manhattan court since 1995, when he was nominated by then-President Bill Clinton.

OpenAI and Microsoft have not yet filed formal counterarguments on the New York cases, but OpenAI issued a public statement this week describing the Times lawsuit as “without merit” and saying the chatbot’s ability to translate some articles verbatim vomiting was a “rare occurrence”. critter.”

“Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents,” according to a Monday blog post from the company. It was then suggested that the Times “instruct the model to regurgitate their examples from many attempts.”

OpenAI cited licensing deals struck last year with The Associated Press, German media company Axel Springer and other organizations as a glimpse into how the company is trying to support a healthy news ecosystem. OpenAI pays an undisclosed fee to license AP’s archive of news stories. The New York Times had similar conversations before deciding to file a lawsuit.

OpenAI said earlier this year that access to AP’s “high-quality, factual text archive” would improve the capabilities of its AI systems. But this week’s blog post downplayed the importance of news content for AI training, arguing that large language models learn from a “vast pool of human knowledge” and that “any single data source — including The New York Times — is insignificant to the intended learning process of the model.”

Much of the AI industry’s argument rests on the “fair use” doctrine of U.S. copyright law, which allows limited uses of copyrighted material, such as for teaching, research, or converting the copyrighted work into something else .

So far, courts have largely sided with tech companies in interpreting how copyright laws should treat AI systems. In a defeat for visual artists, a federal judge in San Francisco last year dismissed much of the first major lawsuit against AI image generators, but allowed part of the case to proceed. Another California judge has rejected comedian Sarah Silverman’s arguments that Facebook parent Meta violated the text of her memoir to build an AI model.

Subsequent cases filed over the past year have provided more detailed evidence, but Aggarwal said that when it comes to using copyrighted content to train AI systems that provide a “small portion of it to users, the courts are simply not inclined to are to find that to be copyright infringement.”

Most tech companies cite as precedent Google’s success in fending off legal challenges to its online book library. The U.S. Supreme Court in 2016 upheld lower court rulings that rejected authors’ claims that Google’s digitization of millions of books and showing excerpts of them to the public amounted to copyright infringement.

But judges interpret the arguments for fair use on a case-by-case basis and it is “actually very fact-dependent,” depending on the economic impact and other factors, said Cathy Wolfe, an executive at the Dutch firm Wolters Kluwer, who is also a member of the board of the Copyright Clearance Center, which helps negotiate licenses for print and digital media in the US

“Just because something is free on the Internet, on a website, doesn’t mean you can copy it and email it, let alone use it for commercial purposes,” Wolfe says. “I don’t know who will win, but I am certainly in favor of protecting copyright for all of us. It stimulates innovation.”

Some media outlets and other content creators are looking beyond the courts and calling on lawmakers or the U.S. Copyright Office to strengthen copyright protections for the AI era. A U.S. Senate Judiciary Committee panel will hear testimony from media executives and advocates on Wednesday during a hearing devoted to AI’s effect on journalism.

Roger Lynch, CEO of the Conde Nast magazine chain, plans to tell senators that generative AI companies are “using our stolen intellectual property to build replacements.”

“We believe a regulatory solution could be simple – by clarifying that use of copyrighted content in combination with commercial Gen AI is not fair use and requires a license,” says a copy of Lynch’s prepared remarks.