
Court Papers Reveal Anthropic Bought, Scanned and Destroyed Millions of Books to Train Its AI — And Tried to Keep It Quiet
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you

Anthropic Settlement and Landmark Rulings Force AI Labs to Rework Training Data
Anthropic agreed to a $1.5 billion settlement after courts scrutinized how large language models handle copyrighted material, and parallel lawsuits by music publishers and creators broaden the exposure—pushing AI firms to reassess training-data provenance, licensing and acquisition channels.

Major music publishers sue Anthropic, seek $3B+ over alleged mass copyright copying
A coalition led by Concord and Universal alleges Anthropic copied and used more than 20,000 copyrighted musical works to train its Claude models and is seeking in excess of $3 billion, relying in part on discovery from prior litigation to show patterns of bulk acquisition. The filing is part of a broader wave of creator and publisher suits testing how AI builders source training data and could force licensing, provenance controls, or injunctive limits on dataset procurement.

Anthropic Accuses DeepSeek, MiniMax and Moonshot of Distillation Mining of Claude
Anthropic alleges three mainland-China labs used over 24,000 fake accounts to record roughly 16 million exchanges from its Claude model to perform large-scale distillation; OpenAI and other industry disclosures show similar extraction tactics but have not independently verified Anthropic’s full counts, deepening policy and legal debates over export controls, telemetry, and model-protection measures.

OpenAI faces copyright and trademark suit from Encyclopaedia Britannica
Encyclopaedia Britannica sued OpenAI claiming the company ingested proprietary encyclopedia text during model training and that outputs sometimes repeat or misattribute that material; the complaint seeks injunctive relief and trademark remedies. The filing comes amid a broader wave of litigation—including multi‑billion‑dollar demands and a reported $1.5 billion authors’ settlement—that is forcing publishers, archivists and model builders to reassess data sourcing, provenance and licensing practices.
Publishers Restrict Internet Archive Access as AI Scraping Risks Rise
Several major news organizations are blocking the Internet Archive’s crawlers amid worries that AI companies could use the Archive as a conduit to collect paywalled journalism. The change intensifies legal and commercial conflicts over training data and raises short-term risks to public access and long-term questions about how journalistic content will be governed for AI use.

OpenAI alleges Chinese rival DeepSeek covertly siphoned outputs to train R1
OpenAI told U.S. lawmakers that DeepSeek used sophisticated, evasive querying and model-distillation techniques to harvest outputs from leading U.S. AI models and accelerate its R1 chatbot development. The claim sits alongside similar industry reports — including Google warnings about mass-query cloning attempts — underscoring a wider pattern that challenges existing defenses and pushes policymakers to consider provenance, watermarking and access controls.

Amazon Reported More Than One Million AI-Related CSAM Alerts to NCMEC but Refuses to Disclose Sources
Amazon told U.S. authorities it flagged over one million instances of AI-linked child sexual abuse material in 2025, driven largely by content it says was found in external data sets used for model training. The company says it removed the material before training and intentionally over-reported to avoid missing cases, but offered no specifics on where the material originated, leaving many reports unusable for law enforcement.

Anthropic Blacklisting Triggers AI Market Shock
A White House‑led supply‑chain designation and de‑facto U.S. blacklist of Anthropic accelerated a broad market repricing across tech and catalyzed a high‑stakes political fight over AI procurement rules. The episode has already prompted roughly $125M in investor‑led pro‑industry political funding, a separate $20M company payment tied to Anthropic, and imperils a roughly $200M defense program with a six‑month migration window.