Learning Center

80% of Top News Sites Now Block AI Training Bots

February 20, 2026

Show Editorial Policy

shield-icon-2

Editorial Policy

All of our content is generated by subject matter experts with years of ad tech experience and structured by writers and educators for ease of use and digestibility. Learn more about our rigorous interview, content production and review process here.

80% of Top News Sites Now Block AI Training Bots
Ready to be powered by Playwire?

Maximize your ad revenue today!

Apply Now

Major news publishers are drawing battle lines against AI companies. Eight in ten of the world's biggest news websites now block AI training bots from crawling their content, marking a dramatic shift in how publishers approach artificial intelligence scraping.

Publishers Block OpenAI, Google, Perplexity

The data comes from Press Gazette's analysis of top news sites globally. Publishers are updating their robots.txt files to specifically exclude crawlers from OpenAI (GPTBot), Google's AI training bot (ChatGPT-User), Anthropic (ClaudeBot), and other AI companies from accessing their content for free training data.

This represents a 300% increase from early 2023, when fewer than 30% of major news sites had AI crawler restrictions in place. The Guardian, BBC, Reuters, and Associated Press lead the blocking effort, while some holdouts remain.

Here's what matters: Publishers realized they were giving away their most valuable asset—original reporting and analysis—to build competitors that could potentially replace them in search results.

Revenue at Stake: $2.1B Content Licensing Market

The financial stakes explain the urgency. OpenAI has signed licensing deals worth $50-100 million annually with publishers like News Corp, Associated Press, and Financial Times. Meanwhile, unlicensed scraping potentially costs publishers billions in lost traffic and revenue.

Translation: Every day without crawler protection means giving competitors free access to content that commands premium licensing fees. Publishers with 5 million monthly visitors could potentially negotiate deals worth $500K-2M annually based on recent disclosed agreements.

The catch: AI-powered search tools like Perplexity and ChatGPT's web browsing can answer user queries directly, potentially reducing click-through rates to publisher sites by 15-40% according to early traffic data from publishers who've implemented tracking.

Update Robots.txt Files Immediately

Publishers can't afford to wait. The blocking process is straightforward: update robots.txt files to disallow specific AI crawlers, but timing matters for negotiations.

"It's never too late to start blocking," according to Press Gazette's analysis, but publishers with existing crawler access face weaker negotiating positions for licensing deals. Companies like OpenAI prefer clean licensing agreements over adversarial relationships with blocked publishers.

Key crawler user agents to block include: GPTBot, ChatGPT-User, CCBot, ClaudeBot, and PerplexityBot. Publishers should also monitor for new crawlers monthly, as AI companies regularly deploy updated bots with different identifications.

Licensing Deals Accelerating Through 2026

Expect more publisher-AI partnerships in the next six months. Google's recent publisher outreach and OpenAI's aggressive deal-making suggest $500 million+ in new licensing agreements are on the way before year-end.

The smart move: Block first, negotiate second. Publishers with crawler restrictions maintain stronger leverage in licensing discussions and can still pivot to partnerships while protecting their content's value.

Publishers can audit their current crawler protection and identify gaps with Playwire's AI Crawler Protection Grader.

New call-to-action

Editorial Disclosure

This article was produced with AI assistance and reviewed by the Playwire editorial team. News sources are cited where applicable. Playwire is committed to providing accurate, timely information to help publishers navigate the digital media business. For questions about our editorial process or to suggest topics for future coverage, contact our team.