AI scraping bots are capturing a growing slice of web traffic, U.S. data shows

🇺🇸United States

TechnologyMedia

Wed, Feb 4, 2026

Multiple internet-infrastructure firms and a bot-tracking specialist report a sharp uptick in automated agents that surf, copy and harvest online content for AI services. Across recent quarters the share of monitored visits attributable to scraping bots rose from a rare edge case to measurable fractions — roughly from one visit per 200 to about one per 50 on some provider datasets — while a growing slice of those requests ignored site directives meant to block crawlers. At the same time, an expanding number of publishers are taking direct action to limit intermediaries that aggregate or archive web content: several prominent outlets have applied selective blocks to the Internet Archive’s crawler after concluding archived collections can be repurposed as an indirect route for model training. That tactical shift — from lawsuits aimed at model builders to constraining intermediaries that facilitate mass ingestion — underscores a strategic pivot in how rights-holders respond. Technical operators of extraction tools are adopting evasion techniques that mimic normal browser behaviour and human interactions, which blurs the line between legitimate users and algorithmic visitors and undermines simple defenses like robots.txt, basic fingerprinting and static rule sets. Defenders are responding with harder measures: higher-fidelity behavioural analytics, session and identity verification for programmatic clients, and commercial offerings that meter or charge automated crawlers. Infrastructure vendors are packaging pay-for-crawl products and access-control services, signaling a move toward explicit machine-to-machine commercial flows rather than the implicit, unpaid scraping that many publishers see today. For publishers and archivists the trade-offs are stark. Targeted blocks and throttling can reduce the risk that organized, machine-readable repositories are mined for training data, but they also erode public preservation and make archives less reliable for reporters, historians and researchers. Some publishers exploring licensing deals would route payments to organizations rather than individual journalists, raising questions about who benefits from negotiated access. The near-term landscape is likely to be messy: defenders will chase evasions, vendors will commercialize access, and some rights-holders will pursue legal remedies while others seek paid APIs. Over the medium term the internet’s economic model for content may bifurcate into human-facing experiences and authenticated, paid machine feeds, with significant consequences for revenue models, content licensing, content provenance and digital preservation.

PREMIUM ANALYSIS

Read Our Expert Analysis

Create an account or login for free to unlock our expert analysis and key takeaways for this development.

By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.

Free Access

No Payment Needed

Join Thousands of Readers

Recommended for you

Markets & Economy

Publishers Restrict Internet Archive Access as AI Scraping Risks Rise

Several major news organizations are blocking the Internet Archive’s crawlers amid worries that AI companies could use the Archive as a conduit to collect paywalled journalism. The change intensifies legal and commercial conflicts over training data and raises short-term risks to public access and long-term questions about how journalistic content will be governed for AI use.

Cybersecurity

AI chatbots vulnerable to simple web manipulation, researchers warn

Security researchers and SEO experts demonstrated that a short, fabricated web article can prompt major chatbots and search AI to repeat false claims within hours. The gap between rapid model deployment and weak provenance checks makes automated answers easy to hijack for misinformation or marketing abuse.

Startups & Venture

AI Startups Capture 41% of Carta Venture Flow, Concentrating Capital

Carta records show 41% of tracked venture dollars flowed to AI startups, with a tiny cohort grabbing half of investment and blockbuster rounds from OpenAI , Anthropic , and xAI . This concentration is driving a K-shaped funding market and lifting near-term fund IRR while amplifying exit and liquidity risk.

AI & Technology

How AI Is Reshaping Engineering Workflows in the U.S.

AI is shifting engineering from manual implementation toward faster, experiment-driven cycles, greater emphasis on documentation and intent, and new platform and data‑architecture demands. Real‑world platform partnerships (for example, Snowflake’s reported deal to embed OpenAI models within its data platform) illustrate both the convenience of in‑place model access and the procurement, cost, and governance tradeoffs that amplify the need for provenance, policy automation, unified data views, and platform engineering to avoid opaque agentic outputs and vendor lock‑in.

Cybersecurity

Surveillance, security lapses and viral agents: a roundup of risks reshaping law enforcement and AI

Recent coverage links expanded government surveillance tooling to broader operational risks while detailing multiple consumer- and enterprise-facing AI failures: unsecured agent deployments exposing keys and chats, a child-toy cloud console leaking tens of thousands of transcripts, and a catalogue of apps and model flows that enable non-consensual sexualized imagery. Together these episodes highlight how rapid capability adoption, weak defaults, and inconsistent platform enforcement magnify privacy, legal and security exposure.

Startups & Venture

U.S. developer unveils rentahuman.ai allowing AI agents to hire people for real-world tasks

A crypto developer launched rentahuman.ai, a platform that lets autonomous AI agents contract humans to perform physical-world tasks for hourly pay. The site—built using iterative AI coding agents—claims tens of thousands of sign-ups and raises questions about labor, accountability, and platform moderation.

AI & Technology

YouTubers Add Snap to Growing Wave of Copyright Suits Over AI Training

A coalition of YouTube creators has filed a proposed class action accusing Snap of using their videos to train AI features without permission, alleging the company relied on research-only video-language datasets and sidestepped platform restrictions. The case seeks statutory damages and an injunction and joins a string of recent suits that collectively threaten how firms source audiovisual training material for commercial AI products.

Policy & Geopolitics

Anthropic Settlement and Landmark Rulings Force AI Labs to Rework Training Data

Anthropic agreed to a $1.5 billion settlement after courts scrutinized how large language models handle copyrighted material, and parallel lawsuits by music publishers and creators broaden the exposure—pushing AI firms to reassess training-data provenance, licensing and acquisition channels.