How do I block AI crawlers from my website?

The robots.txt file is your primary tool for blocking AI crawlers. Create or edit a robots.txt file in your site's root directory and add entries for each crawler you want to block, including GPTBot, ChatGPT-User, ClaudeBot, CCBot, Google-Extended, Meta-ExternalAgent, Applebot-Extended, and anthropic-ai.

What types of AI crawlers exist?

AI crawlers fall into three primary categories: AI Data Scrapers that collect content to train large language models (like GPTBot and ClaudeBot), AI Search Crawlers that gather information for AI-powered search results (like Google-Extended and Amazonbot), and AI Assistants that fetch content in real-time to answer user queries (like ChatGPT-User).

Can I block AI training crawlers while keeping my search rankings?

Yes. Google-Extended can be blocked separately from Googlebot to prevent AI training without affecting search indexing. However, this doesn't protect you from AI Overviews, which use the same Googlebot crawler that indexes your site.

What content types are most at risk from AI crawlers?

Fact-based informational content faces the highest risk. How-to guides, definitions, and factual summaries can be easily absorbed and regurgitated by AI systems. Opinion, analysis, and personality-driven content holds more defensible ground since AI can't replicate your unique voice or authority.

Learning Center

Should You Block AI Crawlers? A Decision Framework for Publishers

Q: Does blocking AI crawlers actually prevent AI referral traffic?

Surprisingly, blocking crawlers might not affect your AI referral traffic. Publishers blocking AI crawlers still receive referral traffic from those same platforms. The New York Times received 240,600 visits from ChatGPT in January 2025 despite blocking crawlers from ChatGPT and Perplexity in its robots.txt protocol.

Q: How should my revenue model affect my AI crawler blocking decision?

Ad-supported publishers feel traffic changes most acutely since every visitor that satisfies their query through an AI summary instead of clicking through represents lost impressions. Subscription models offer more resilience because subscribers who value your content won't cancel because they saw a summary in ChatGPT.

Playwire Strategy Team

December 8, 2025

Show Editorial Policy

Editorial Policy

All of our content is generated by subject matter experts with years of ad tech experience and structured by writers and educators for ease of use and digestibility. Learn more about our rigorous interview, content production and review process here.

AI Blocking

Should You Block AI Crawlers? A Decision Framework for Publishers

Ready to be powered by Playwire?

Maximize your ad revenue today!

Apply Now

1 of 5 Content Type & Competitive Moat

What best describes your primary content?

AI systems replicate some content types more easily than others.

Key Points

Blocking AI crawlers is a nuanced decision that depends on your specific content type, traffic sources, revenue model, and competitive position
The robots.txt file is the primary tool for how to block AI crawlers, but it relies on voluntary compliance from AI companies
Publishers blocking AI crawlers won't recover previously scraped content, only prevent future crawling
Google-Extended can be blocked separately from Googlebot to prevent AI training without affecting search indexing
Your monetization strategy should account for both potential traffic losses from AI Overviews and opportunities from emerging AI referral traffic

How to Block AI Crawlers: The Question Every Publisher Is Asking

AI crawlers have become the uninvited guests at the publishing party. They show up unannounced, consume your carefully crafted content, and leave without so much as a thank-you note. The question on every publisher's mind isn't whether AI companies are using their content. The question is whether blocking AI crawlers makes strategic sense.

This decision deserves more than a knee-jerk reaction. What works for The New York Times might be completely wrong for a niche gaming site or educational resource. Nearly 21% of the top 1000 websites now have rules for ChatGPT's GPTBot in their robots.txt file.

Your blocking strategy should be just as deliberate. For publishers looking to understand the full landscape of options available, our complete guide to AI crawlers covers blocking, allowing, and optimizing for maximum revenue.

Need a Primer? Read this first:
Complete Guide to AI Crawlers: Comprehensive overview of blocking, allowing, and optimizing AI crawler access for maximum revenue

Understanding What AI Crawlers Actually Do

Before diving into how to block AI, you need to understand the players involved. AI crawlers serve different purposes, and each behaves differently on your site.

AI crawlers fall into three primary categories. AI Data Scrapers collect content to train large language models. AI Search Crawlers gather information for AI-powered search results and summaries. AI Assistants fetch content in real-time to answer user queries.

Crawler Type	Purpose	Examples	Traffic Potential
AI Data Scrapers	Training LLMs	GPTBot, ClaudeBot, CCBot	Minimal to none
AI Search Crawlers	Powering AI search	Google-Extended, Amazonbot	Moderate (emerging)
AI Assistants	Real-time query responses	ChatGPT-User, Meta-ExternalFetcher	Growing but small

The critical distinction here matters for your blocking strategy. Training crawlers take your content and essentially memorize it. Search and assistant crawlers may actually send some traffic back your way.

The Decision Framework: Four Critical Factors to Block AI Strategically

Your AI crawler blocking decision should weigh four primary factors. Each factor pushes the needle in different directions depending on your specific circumstances.

Factor 1: Content Type and Competitive Moat

The nature of your content dramatically affects the blocking calculus.

Fact-based informational content faces the highest risk. How-to guides, definitions, and factual summaries can be easily absorbed and regurgitated by AI systems. Once an AI model learns your content, users may never need to visit your site for that information.

Opinion, analysis, and personality-driven content holds more defensible ground. AI can summarize what you wrote, but it can't replicate your unique voice or authority.

Real-time and breaking content sits in an interesting position. AI models can't train on what hasn't happened yet.

Content Type	Blocking Recommendation	Rationale
Informational/How-to	Lean toward blocking	High appropriation risk, easily replicated
Opinion/Analysis	Consider allowing	Voice-dependent, harder to replicate value
Breaking News	Mixed approach	Time-sensitive nature provides protection
Original Research	Strongly consider blocking	High-value content easily appropriated
Community/UGC	Allow with monitoring	Engagement-dependent, benefits from discovery
Interactive Games/Tools	Allow with monitoring	Engagement-heavy, AI can’t replicate, benefits from discovery

Related Content:
Technical Implementation Guide for Blocking AI Scrapers: Step-by-step instructions for implementing AI crawler blocks
How to Block AI Bots with Robots.txt: Complete walkthrough of robots.txt syntax and directives
Using Cloudflare to Block AI Crawlers: CDN-level protection setup and configuration guide
Legal Landscape for Blocking AI Scrapers: Understanding your legal options and protections in 2025
Monetization Strategies for Publishers: Diversify income streams to weather AI-driven traffic changes

Factor 2: Traffic Source Composition

Where your traffic comes from today shapes what blocking AI crawlers might cost you tomorrow.

Publishers heavily dependent on organic search face a more complicated calculation. Google-Extended can be blocked separately from Googlebot, preserving your search ranking while opting out of AI training. However, this doesn't protect you from AI Overviews, which use the same Googlebot crawler that indexes your site. Understanding what publishers need to know about Google's recent algorithm updates can help you navigate these overlapping concerns.

Direct traffic champions have more flexibility. If your audience comes directly through bookmarks, newsletters, or brand recognition, blocking AI crawlers becomes a lower-risk proposition.

Social-dependent publishers face different pressures entirely. Social platforms are already reducing referral traffic, and AI crawlers aren't your primary concern.

Factor 3: Revenue Model Architecture

Your monetization strategy dictates how traffic changes translate to revenue impact when you block AI bots.

Ad-supported publishers feel traffic changes most acutely. Every visitor that satisfies their query through an AI summary instead of clicking through represents lost impressions. Your entire revenue model hinges on eyeballs reaching your pages. Publishers exploring ways to diversify their income streams should review proven monetization strategies for publishers and content creators.

Akamai reports that AI search engines send 96% less referral traffic to news sites than traditional Google search. For ad-dependent publishers, this traffic erosion directly hits the bottom line.

Subscription models offer more resilience. Subscribers who value your content won't cancel because they saw a summary in ChatGPT. AI citations can serve as discovery tools that drive new subscription interest.

Hybrid models require hybrid thinking. Calculate the specific exposure in each revenue stream.

Factor 4: Competitive Position and Market Dynamics

Your market position influences both the risks and rewards of your blocking decision.

Dominant players in a niche often benefit from blocking. If you're the authoritative source on a topic, AI systems have already learned from your content. Future scraping offers diminishing returns while ongoing access lets competitors train models that might reduce your authority advantage.

Emerging publishers face a different calculation. Being cited in AI responses can accelerate brand awareness. The exposure value may outweigh the training risk during your growth phase.

Highly competitive markets amplify the stakes. If your competitors are blocking and you're not, you're essentially subsidizing AI training on behalf of your entire industry. Before making any blocking decisions, publishers should also understand the legal landscape around blocking AI scrapers in 2025.

Visit the AI Blocking resource center.

How to Block AI Crawlers: Technical Implementation

Once you've decided to block AI, implementation is relatively straightforward. The robots.txt file is your primary tool, though it comes with important limitations. For step-by-step instructions, our technical implementation guide for blocking AI scrapers walks through the entire process.

The robots.txt Approach to Block AI Bots

Create or edit a robots.txt file in your site's root directory. Add entries for each crawler you want to block. The essential crawlers to consider blocking include:

GPTBot: OpenAI's primary training crawler
ChatGPT-User: OpenAI's real-time assistant crawler
ClaudeBot: Anthropic's training crawler
CCBot: Common Crawl's training data collector
Google-Extended: Google's AI training crawler (separate from search)
Meta-ExternalAgent: Meta's AI training crawler
Applebot-Extended: Apple's AI training crawler
anthropic-ai: Anthropic's general web crawler

Here's the critical caveat. Robots.txt is a voluntary standard. Well-behaved crawlers respect it. Bad actors ignore it. Some AI companies have been caught continuing to scrape sites that explicitly block them. For a complete walkthrough of the syntax and directives, see our guide to blocking AI bots with robots.txt.

Beyond robots.txt: Additional Protections to Block AI Scrapers

Cloudflare and other CDN providers now offer AI crawler blocking at the network level. Over a million websites have enabled Cloudflare's AI scraper blocking feature. This approach catches crawlers that ignore robots.txt directives. Publishers using Cloudflare can follow our setup and configuration guide for blocking AI crawlers.

User-agent detection through your server configuration provides another layer. You can configure Apache or Nginx to return 403 errors to known AI crawler user agents. Use robots.txt as your stated policy and CDN-level blocking as enforcement.

As of July 2025, Cloudflare automatically blocks AI crawlers for new domains added to their service. If you've been using Cloudflare longer, you'll need to manually enable this protection.

The Traffic Trade-offs: What the Data Shows

Let's talk numbers. The AI traffic landscape is evolving rapidly, and the data tells an interesting story for publishers deciding whether to block AI.

AI referral traffic is growing but remains small. According to SE Ranking research, AI platforms account for just 0.15% of global internet traffic, compared to 48.5% from organic search. But AI traffic has grown more than seven times since 2024.

ChatGPT dominates AI referrals with nearly 78% of all AI-driven traffic. Perplexity holds about 15%, with Google Gemini trailing at 6.4%. Users referred by AI platforms tend to stay longer, with sessions averaging 9-10 minutes.

Here's the uncomfortable truth. Blocking crawlers might not affect your AI referral traffic. Digiday reports that publishers blocking AI crawlers still receive referral traffic from those same platforms. The New York Times received 240,600 visits from ChatGPT in January 2025 despite blocking crawlers from ChatGPT and Perplexity in its robots.txt protocol.

The crawl-to-referral ratio tells an even starker story. Cloudflare data shows OpenAI's crawl-to-referral ratio sits at 1,700:1 as of June 2025. Anthropic's ratio is even more dramatic at 73,000:1. These AI companies consume vast amounts of content while returning almost no traffic.

Making Your Decision: A Practical Checklist

Work through this checklist to crystallize your AI crawler blocking decision.

Assess your content exposure:

What percentage of your content is purely informational versus voice-driven?
How much of your traffic comes from long-tail informational queries?

Evaluate your traffic composition:

What percentage of traffic comes from organic search?
Do you have strong direct traffic or newsletter channels?

Consider your market position:

Are you an established authority or an emerging player?
What are your competitors doing about AI crawlers?

Calculate your revenue risk:

How much revenue directly correlates to pageviews?
Do you have diversified revenue streams?

Why Revenue Optimization Matters More Than Ever

Regardless of your decision to block AI crawlers, one truth remains constant. You need to maximize revenue from the traffic you do receive. The AI disruption makes this more important than ever.

Publishers who optimize their existing traffic position themselves to weather whatever changes come next. Premium ad formats consistently outperform standard display. Video units drive CPMs multiple times higher than banner ads. Strengthening your SEO foundation also helps, implementing schema markup for publishers ensures search engines and AI systems properly understand your content's structure and authority.

The smartest response to AI uncertainty isn't just defensive. It's ensuring every visitor you receive generates maximum value. For publishers looking to dive deeper into technical SEO strategies, our Playwire Live session on schema and SEO covers implementation approaches that work alongside your AI crawler strategy.

Playwire helps publishers optimize every impression through advanced yield management and premium demand access. Our AI and machine learning algorithms maximize CPMs across all inventory types, while our direct sales team connects publishers with premium brand campaigns. Whether AI traffic shifts work for you or against you, starting with a solid monetization foundation protects your business.

Next Steps:
Schema Markup Guide for Publishers: Ensure search engines and AI systems properly understand your content structure
Playwire Live: Schema and SEO for Publishers: Implementation approaches that work alongside your AI crawler strategy

The Balanced Approach: Monitor, Test, Adapt

The AI crawler landscape changes monthly. Any blocking strategy you implement today might need revision in six months.

Track your referral traffic from AI platforms monthly. Watch for new crawlers entering the market. Monitor your search visibility separately from AI training opt-outs. The publishers who thrive won't be those who make one perfect decision today. They'll be the ones who establish frameworks for continuous evaluation and adjustment.

Ready to ensure your monetization strategy maximizes every visitor, regardless of where your traffic comes from? Contact Playwire to learn how our platform can amplify your ad revenue.

Share this article

AI Blocking

Self-Service or Managed Service?

Flex Suite

Get in Touch

Should You Block AI Crawlers? A Decision Framework for Publishers

Editorial Policy

Ready to be powered by Playwire?

Quiz: Should You Block AI Crawlers?

What best describes your primary content?

Strong Case for Blocking

Why This Recommendation

Recommended Actions

Your Profile Summary