SWEN.AI
NewsTools500+BenchmarkTutorialsRankingGitHub RadarArticlesSponsor
CtrlK
NewsToolsBenchmarkTutorialsRanking
SWEN.AI
NewsTools500+BenchmarkTutorialsRankingGitHub RadarArticlesSponsor
CtrlK
NewsToolsBenchmarkTutorialsRanking
  1. Início
  2. Artificial Intelligence
  3. Anthropic Sued for $75 Million Over Alleged Use of Pirate...
Artificial Intelligence

Anthropic Sued for $75 Million Over Alleged Use of Pirated Books to Train Claude AI

Authors file a lawsuit claiming the AI startup infringed on copyrights by using unauthorized datasets to develop its large language models.

CF
Carla Ferreira5 de julho de 2026, 14:53 Updated há cerca de 1 hora
7 min
BeInCrypto
news.google.com
Ver original
Share:
Anthropic Sued for $75 Million Over Alleged Use of Pirated Books to Train Claude AI
Double-tap to zoom

$75 million. That's the price tag on the latest legal headache for one of Silicon Valley's most prominent AI startups.

Anthropic is being sued by a group of authors who claim the company used pirated copies of their books to train its Claude AI models.

And this case could reshape the entire AI training landscape.

Why this lawsuit matters

> "The case puts a $75 million figure on what authors say is systematic copyright infringement in AI training."

The lawsuit alleges that Anthropic didn't just stumble upon copyrighted material. According to the complaint, the company knowingly used unauthorized datasets containing pirated books to build and refine its large language models.

This isn't a minor legal skirmish. It strikes at the heart of how AI companies source their training data.

As BeInCrypto reports, the authors are seeking $75 million in damages from the AI startup.

What the authors are claiming


Pirated datasets as training fuel

The core allegation is straightforward. The plaintiffs say Anthropic used datasets that contained pirated versions of copyrighted books — without permission, without licensing, and without compensation.

For the authors involved, their creative work was essentially fed into a machine that now generates revenue for Anthropic.

No royalties. No consent. No credit.

The scale of the problem

This isn't about a handful of obscure titles. The lawsuit suggests a broad pattern of using copyrighted literary works to improve Claude's language capabilities.

Training large language models requires massive amounts of text data. Books, articles, websites — all of it gets consumed by the model during training.

The question at the center of this case: where did Anthropic draw the line on what data was fair game?

>📌 READ MORE: Anthropic Faces a New $75 Million Lawsuit for Pirating Books to Train Claude AI

A growing wave of AI copyright lawsuits

Anthropic isn't alone in the legal crosshairs. The entire AI industry is facing a reckoning over training data practices.

The precedents piling up

The New York Times sued OpenAI and Microsoft in late 2023, alleging that millions of its articles were used to train GPT models without permission. That case is still ongoing and could set major precedent.

Authors like Sarah Silverman and Michael Chabon have also filed suits against AI companies, arguing that their published works were scraped and used without authorization.

According to BeInCrypto, this new $75 million lawsuit adds to a rapidly growing list of legal challenges facing AI developers.

What makes this case different

The specific allegation of using *pirated* books — not just copyrighted ones — adds a sharper edge to this lawsuit.

There's a difference between scraping publicly available text and using datasets that were compiled from pirated sources. The latter implies a more deliberate disregard for intellectual property rights.

If proven, this could make it harder for Anthropic to mount a "fair use" defense.

The fair use debate

> "The line between learning from text and stealing text is exactly what courts are now being asked to define."

AI companies have generally argued that training models on copyrighted material falls under fair use — a legal doctrine that allows limited use of copyrighted works for purposes like research, commentary, and education.

The AI industry's position

The argument goes like this: AI models don't copy books. They learn patterns from text, much like a human reader absorbs ideas from reading.

No single book is reproduced. The model distills language patterns from billions of words.

Therefore, the argument goes, no copyright is actually infringed.

The authors' counterargument

Authors and publishers see it very differently. Their position is clear: you can't build a billion-dollar product on the backs of creative works and pay nothing to the creators.

The fact that the output doesn't reproduce text word-for-word doesn't matter, they argue. The training itself constitutes unauthorized copying.

And if pirated datasets were involved, the fair use argument becomes even weaker.

As reported by BeInCrypto, the plaintiffs are framing this as a clear case of copyright infringement rather than a gray area of fair use.

>📌 READ MORE: Anthropic Faces a New $75 Million Lawsuit for Pirating Books to Train Claude AI

What's at stake for Anthropic


Financial exposure

A $75 million judgment would be significant, but it's not existential for a company that has raised billions in funding.

The real risk is precedent. A ruling against Anthropic could open the floodgates for similar lawsuits — each one targeting different datasets, different books, different authors.

The cumulative liability across the industry could reach into the billions.

Reputation and trust

Anthropic has built its brand around being the "responsible AI" company. It emphasizes safety, ethics, and alignment.

A finding that the company used pirated material to train its models would undercut that carefully cultivated image.

For a company that markets itself as the ethical alternative in AI, this kind of allegation stings.

Industry-wide implications

Here's the bigger picture. Every major AI company — OpenAI, Google, Meta, Mistral — uses massive text datasets for training.

If courts rule that using copyrighted books in training data constitutes infringement, the entire industry will need to rethink its approach.

That could mean:

  • Licensing deals: Paying publishers and authors for training data rights
  • Synthetic data: Generating training data artificially instead of scraping real text
  • Opt-in systems: Only using content from creators who explicitly consent
  • Data audits: Verifying that every dataset is free of pirated material
  • Higher costs: All of the above would significantly increase the cost of training models

How other AI companies are responding

Some players are already getting ahead of the curve.

OpenAI has struck licensing deals with publishers like the Associated Press and Axel Springer. Google has similar agreements in place.

But these deals cover only a fraction of the data used in training. And they don't retroactively address data that was already consumed.

The industry is essentially trying to build the plane while flying it.

The regulatory backdrop

Courts aren't the only battleground. Lawmakers are paying attention too.

In the US, proposed legislation would require AI companies to disclose what copyrighted material they use in training. The EU's AI Act already includes transparency requirements around training data.

According to BeInCrypto, this lawsuit arrives at a moment when regulatory scrutiny of AI training practices is intensifying globally.

The combination of legal action and regulatory pressure could force a fundamental shift in how AI models are built.

What Anthropic has said

The source does not provide specific statements from Anthropic regarding this particular lawsuit. Generally speaking, AI companies facing similar suits have defended their practices by invoking fair use and arguing that model training is transformative.

Whether that defense holds up in court remains to be seen.

The bottom line

This $75 million lawsuit against Anthropic is about much more than one company and a group of authors.

It's a test case for the entire AI industry's relationship with the creative works it depends on.

If the authors prevail, every AI company will need to rethink how it sources training data — and that could reshape the economics of building large language models.

The real question isn't whether AI companies will eventually pay for training data. It's how much, and whether the bill comes through licensing deals or courtroom judgments.

View in SWEN Ranking →

Claude, Mistral — by ELO, price and speed

Open Benchmark
Share:

Source: BeInCrypto

AI Benchmark

Compare GPT, Claude, Gemini and more: pricing, speed and benchmarks.

See Full RankingCompare ModelsTop LLMs 2026

Learn by Doing

Guias de uso do Claude, API com Python, Projects e agentes autônomos.

Tutorials on ClaudeAll Tutorials

Explore other categories

Related

  • Alibaba Bans Employees from Using Anthropic's Claude Code, Mandates Switch to Qoder
  • Alibaba bans Anthropic's Claude Code over alleged China-detection backdoor
  • Globant Expands Strategic Alliance with Anthropic to Scale AI Solutions
  • Alibaba bans employees from using Anthropic's Claude Code tool