Publishers Restrict Internet Archive Access as AI Scraping Risks Rise
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you
AI scraping bots are capturing a growing slice of web traffic, U.S. data shows
Industry telemetry shows autonomous scraping agents are claiming a growing share of visits to commercial sites while publishers and intermediaries — including the Internet Archive — are increasingly blocked or rate-limited. The result is a widening technical, legal and commercial contest over who may harvest web content and on what terms, with implications for preservation, licensing and the emergence of paid machine-to-machine access.
Court orders Anna’s Archive to purge WorldCat data and stop scraping, but compliance is doubtful
A U.S. federal judge entered a default judgment requiring Anna’s Archive to stop harvesting WorldCat metadata and to delete all copies of that data after a year-long scraping campaign. The injunction establishes legal liability for the group but leaves serious questions about practical enforcement and the future of preservation-minded scraping.

European publishers lodge antitrust complaint over Google's AI summaries
The European Publishers Council has filed an antitrust complaint with EU authorities alleging that Google's AI-generated summary feature uses publishers' content without consent or fair payment, broadening a regulatory review that now intersects with EU Digital Markets Act demands and parallel publisher tactics like opt-outs and archive blocking. The move increases pressure on regulators to consider structural or conduct remedies that could force licensing, product redesigns, or technical opt-outs for publishers.

Google weighing publisher opt-out for AI-generated Search features in the UK
Google has begun evaluating controls that would allow websites to decline inclusion in AI-driven Search features, a move prompted by recent scrutiny from the UK regulator. The change is currently framed as an exploratory update focused on balancing quick search usefulness with publishers’ content management rights.

Court Papers Reveal Anthropic Bought, Scanned and Destroyed Millions of Books to Train Its AI — And Tried to Keep It Quiet
Newly unsealed court documents show Anthropic acquired and digitized vast numbers of used books to refine its Claude models, then destroyed the physical copies. The disclosures sit alongside separate, expanding litigation and publisher actions — including a multi‑billion music‑publishing complaint and publisher blocks on the Internet Archive — that together signal a widening backlash over how training data is sourced.

Major music publishers sue Anthropic, seek $3B+ over alleged mass copyright copying
A coalition led by Concord and Universal alleges Anthropic copied and used more than 20,000 copyrighted musical works to train its Claude models and is seeking in excess of $3 billion, relying in part on discovery from prior litigation to show patterns of bulk acquisition. The filing is part of a broader wave of creator and publisher suits testing how AI builders source training data and could force licensing, provenance controls, or injunctive limits on dataset procurement.

Google DeepMind restricts Antigravity access, cutting OpenClaw integrations
Google DeepMind suspended Antigravity access for OpenClaw-based integrations, citing abusive usage and service degradation. The action blocks a path to Gemini tokens and accelerates a shift toward closed, vertically controlled agent stacks.

AI Concentration Crisis: When Model Providers Become Systemic Risks
A late-2025 proposal by a leading AI developer for a government partnership exposed how few firms now control foundational AI layers. The scale of infrastructure spending, modest funding for decentralized alternatives, and high switching costs create a narrow window to build competitive, interoperable options before dominant platforms lock standards and markets.