Writers are often cautioned not to hit readers over the head with the dictionary. But what if it’s the dictionary that’s throwing blows?
On Friday, Encyclopedia Britannica and its subsidiary Merriam-Webster struck back against the large language models plundering its tomes of knowledge and sued OpenAI for allegedly using its copyrighted reference materials to train its AI models “at massive scale,” after filing a similar suit against Perplexity.AI last year.
In the complaint, which Reuters reported on Monday, Britannica claimed that OpenAI unlawfully copied nearly 100,000 of its online articles and encyclopedia and dictionary entries to teach its GPT family of models. ChatGPT will even produce “near-verbatim” copies of its entries and dictionary definitions, it alleged providing several examples, something that is commonly observed across many chatbots.
But more than that, OpenAI “cannibalized” Britannica’s web traffic by showing ChatGPT users an AI-generated summary of its content, Britannica said, hurting its bottom line.
This argument echoes those raised by journalism outlets and other online sites, which find their traffic being suffocated as more people use AI chatbots instead of a traditional search engine.
“ChatGPT starves web publishers like [Britannica] of revenue by generating responses to users’ queries that substitute, and directly compete with, the content from publishers like [Britannica],” the encyclopedia maker said in the complaint.
Citing a key piece of US trademark law called the Lanham Act, Britannica further accuses OpenAI of violating its trademarks when ChatGPT hallucinates made-up answers and wrongly attributes them to Britannica, which it also says gives the false impression that the usage of its content is approved or sponsored by the encyclopedia.
The complaint joins several other major lawsuits that authors, publishers, and news agencies have filed against AI companies, most of which are still ongoing. Depending on the outcome, they could have seismic implications for how generative AI companies operate. But as it stands, whether it constitutes infringement to use copyrighted content to train AI models, even without permission or compensation, is an open ended question —and one frustrated by the fact that AI developers are rarely transparent about where they’re sourcing training material for their models.
One of the most major suits that’s come to a conclusion so far was by a group of authors against Anthropic. Anthropic, it was revealed, pirated millions of digital books to train its Claude chatbot and scanned and shredded millions of more physical ones. The judge ruled that Anthropic’s use of the texts to train its AI was “transformative,” but said its use of pirated copies was illegal. Anthropic agreed to settle with the authors for $1.5 billion.
More on AI: Panicked OpenAI Execs Cutting Projects as Walls Close In
The post Encyclopedia Britannica Hits OpenAI With Scary Lawsuit appeared first on Futurism.


