Anthropic Suddenly Cares Intensely About Intellectual Property After Realizing With Horror That It Accidentally Leaked Claude’s Source Code

The AI industry largely acts as if it’s above lowly copyright laws — unless, of course, those laws happen to be protecting its own interests.

As the Wall Street Journal reports, Anthropic is scrambling to contain a leak of its Claude Code AI model’s source code by issuing a copyright takedown request for more than 8,000 copies of it — a gallingly ironic stance for the company to be taking, considering how it trained its models in the first place.

The leak isn’t considered to be an outright disaster; no customer data was exposed, Anthropic says, nor were the internal mathematical “weights” that determine how the AI “learns” and which distinguish it from other models. But it did expose the techniques its engineers used to get its AI model to act as an autonomous agent, a form of digital infrastructure coders call a harness, and other tricks for making the AI operate as seamlessly as it does.

Hence Anthropic’s copyright takedown request, which targets the thousands of copies that were shared on GitHub. It later narrowed its request from 8,000 copies to 96 copies, according to the WSJ reporting, claiming that the initial one covered more accounts than intended.

It’s certainly within Anthropic’s right to issue the takedown request, but the hypocrisy of Anthropic running to the law to protect its intellectual property is plain to see, especially for a company that’s relentlessly positioned itself as the ethical adult in the room.

Back when Anthropic was still a nascent splinter group formed from former OpenAI researchers, for instance, it needed access to a wealth of high quality training data to build its Claude AI model.

To do that, it first relied on digital books. But it didn’t pay for them or choose only to use ones in the public domain. Instead, it downloaded millions of pirated volumes from the online “shadow library” LibGen. While LibGen doesn’t position itself as a pirate website, Anthropic also downloaded books from a similar hub literally called “Pirate Library Mirror.” (Anthropic cofounder Ben Mann was ebullient about the site’s launch: “just in time!!!” he wrote in a message to employees, along with a link to the site.)

The practice was unearthed in a lawsuit brought by a group of authors against Anthropic, which ended in a $1.5 billion settlement after a judge deemed the use of the pirated books to be illegal.

Anthropic also scanned and destroyed millions of used physical books in a secret initiative called Project Panama. The process involved cutting the pages out of the volumes using higher powered machinery, which once scanned were tossed out and recycled. The judge didn’t find this to be illegal, but Anthropic was evidently aware of how bad the practice’s optics were. “We don’t want it to be known that we are working on this,” an unsealed internal planning document from 2024 stated, via The Washington Post.

Unfortunately for Anthropic, it only has itself to blame for the leak. When it released its 2.1.88 of Claude Code npm package, it accidentally left in what’s called a source map file, which points to where the source code is stored online — a giant “X marks the spot” for prying eyes. Sleuths followed the trail and downloaded the code package, and uploaded copies in the thousands to GitHub, where they can still be found. The incident has raised questions over whether AI was involved, given a number of high profile AI coding blunders at competitors like Amazon and Meta, along with Anthropic’s frequent boasts of how its models were was built using its own AI coding tools. Anthropic officially insists, however, that it was down solely to “human error.”

More on AI: Leaked Claude Code Shows Anthropic Building Mysterious “Tamagotchi” Feature Into It

The post Anthropic Suddenly Cares Intensely About Intellectual Property After Realizing With Horror That It Accidentally Leaked Claude’s Source Code appeared first on Futurism.

Related Posts