How do I block Meta AI crawlers using robots.txt?

Add User-agent directives for meta-externalagent, FacebookBot, and Meta-ExternalFetcher followed by 'Disallow: /' to your robots.txt file. This tells Meta's AI training crawlers they cannot access your content while optionally allowing facebookexternalhit for link previews.

What if Meta AI crawlers ignore my robots.txt?

If robots.txt compliance is questionable, implement server-level blocking using Apache .htaccess or Nginx configurations that return 403 Forbidden errors to Meta AI user agents. Cloudflare users can enable one-click AI bot blocking for guaranteed enforcement.

How much traffic does Meta AI generate compared to referral traffic?

According to Cloudflare data, Meta's AI crawlers generate 52% of all AI crawler traffic with a crawl-to-referral ratio of approximately 73,000:1. This means Meta extracts content at an extraordinary rate while sending virtually no traffic in return.

Learning Center

How to Block Meta AI From Accessing Your Website Content

Q: Should publishers who rely on Facebook traffic block Meta AI?

Meta-dependent publishers should use selective blocking: block meta-externalagent, FacebookBot, and Meta-ExternalFetcher while allowing facebookexternalhit. This stops AI training while preserving social media link previews and referral traffic that feeds ad revenue.

Playwire Strategy Team

December 8, 2025

Show Editorial Policy

Editorial Policy

All of our content is generated by subject matter experts with years of ad tech experience and structured by writers and educators for ease of use and digestibility. Learn more about our rigorous interview, content production and review process here.

AI Blocking

How to Block Meta AI From Accessing Your Website Content

Ready to be powered by Playwire?

Maximize your ad revenue today!

Apply Now

Key Points

Meta operates multiple crawlers: Understanding the difference between facebookexternalhit (for link previews) and meta-externalagent (for AI training) is critical before implementing any blocks.
Blocking decisions have real trade-offs: Block the wrong crawler and your Facebook link previews disappear. Block the right one and you stop AI training without affecting social sharing.
robots.txt is your first line of defense: Simple directives can stop most Meta AI crawlers, but they rely on Meta actually honoring your requests.
Server-level blocks provide stronger enforcement: When robots.txt compliance is questionable, firewall rules and .htaccess configurations offer more reliable protection.
Meta-heavy publishers need a nuanced strategy: If Facebook and Instagram drive significant traffic to your site, a complete Meta block could devastate your referral numbers and subsequently your ad revenue.

What Is Meta AI Crawling and Why Does It Matter for Publishers?

Meta AI crawling refers to the automated process Meta uses to access and index website content for training its artificial intelligence models. For publishers who depend on ad revenue, understanding how to block Meta AI is no longer optional. It is a business decision with real financial implications.

The stakes are significant. According to Cloudflare data from July 2025, Meta's AI crawlers alone generate 52% of all AI crawler traffic, more than double the combined traffic from Google and OpenAI. Meta's crawl-to-referral ratio sits at approximately 73,000:1, meaning Meta extracts content from your site at an extraordinary rate while sending virtually no traffic in return.

This imbalance fundamentally breaks the traditional publisher-crawler relationship. Search engines historically crawled content in exchange for driving referral traffic. AI crawlers take your content to train models that may actually reduce your traffic by powering answer engines that keep users from clicking through to your site.

If you're weighing whether blocking is even the right strategy for your situation, our complete publisher's guide to AI crawlers covers whether to block, allow, or optimize for maximum revenue.

Need a Primer? Read this first:
The Complete Publisher's Guide to AI Crawlers: Understand whether to block, allow, or optimize AI crawlers for maximum revenue
How to Manage and Monitor Your Website Ad Revenue Metrics: Essential metrics to track before making decisions that impact traffic sources

The Meta Crawler Landscape: Know Your Bots

Meta operates several web crawlers, and the differences between them actually matter quite a bit. Understanding which bots do what will save you from accidentally nuking your social media presence while trying to protect your content from AI training.

The company has been significantly less transparent about their AI-related crawling activities than other tech giants. In August 2024, Meta quietly launched meta-externalagent without a formal announcement, leaving publishers scrambling to understand what this new bot was doing on their servers.

Meta's Primary Crawlers Explained

Here's what you need to know about each Meta bot currently in circulation.

Crawler Name	User Agent String	Primary Purpose	AI Training?	Block Impact
facebookexternalhit	`facebookexternalhit/1.1`	Link preview generation for Facebook, Instagram, Messenger shares	Unclear (possibly dual-use)	Breaks link previews on all Meta platforms
meta-externalagent	`meta-externalagent/1.1`	AI model training and content indexing	Yes (confirmed)	Stops AI training; no effect on link previews
FacebookBot	`FacebookBot/1.0`	Speech recognition and language model training	Yes	Minimal user-facing impact
Meta-ExternalFetcher	`Meta-ExternalFetcher/1.0`	AI assistant task completion	Yes	Affects Meta AI search features

The critical distinction here is between facebookexternalhit and meta-externalagent. The former has existed for years and generates those nice link previews when someone shares your article on Facebook. The latter is Meta's dedicated AI training crawler.

How to Block Meta AI With robots.txt

The robots.txt file remains the standard method for communicating crawler preferences to well-behaved bots. Adding Meta-specific directives to this file takes about 30 seconds and requires zero technical expertise. This approach represents the most accessible way for publishers to block Meta AI from their websites.

If you're also concerned about Google's AI features using your content, you'll want to review our guide on how to block Google AI Overview from using your content, which covers similar robots.txt configurations for Google's crawlers.

Basic Meta AI Blocking Directives

To block Meta's AI training crawlers while preserving link preview functionality, add these lines to your robots.txt file:

robots.txt

# Block Meta AI Training Crawlers
User-agent: meta-externalagent
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Meta-ExternalFetcher
Disallow: /

# Allow link preview crawler (optional - comment out to block everything)
User-agent: facebookexternalhit
Allow: /

This configuration stops the AI training bots while permitting the link preview crawler to do its job. Your shared links will still look pretty on Facebook and Instagram.

The Nuclear Option: Block Everything Meta

Some publishers want nothing to do with any Meta crawler. If that describes your situation, here's the comprehensive block to stop all Meta AI and social crawlers:

robots.txt

# Block ALL Meta Crawlers
User-agent: facebookexternalhit
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Meta-ExternalFetcher
Disallow: /

User-agent: Facebot
Disallow: /

Fair warning: implementing this will make your shared links look terrible. No images, no descriptions, just bare URLs. If social media traffic matters to your ad revenue, think carefully before going nuclear.

The robots.txt Trust Problem

Here's where things get uncomfortable. robots.txt is essentially an honor system, and several publishers have reported that Meta's crawlers don't always honor these directives.

Some webmasters have documented meta-externalagent continuing to crawl their sites despite explicit robots.txt blocks. One publisher reported receiving over 148,000 requests in a single day from Meta's AI crawler, effectively creating a denial-of-service situation.

The compliance issue means robots.txt might not be enough. You may need server-level enforcement to truly block Meta AI crawlers. For a comprehensive walkthrough of all available blocking methods, our technical implementation guide for blocking AI scrapers from your website covers everything from basic directives to advanced firewall configurations.

Related Content:
How to Block Google AI Overview From Using Your Content: Similar robots.txt configurations for blocking Google's AI crawlers
Technical Implementation Guide for Blocking AI Scrapers: Complete walkthrough of all blocking methods from basic to advanced
How to Get AI Tools to Cite Your Website: An alternative strategy that drives traffic rather than prevents access
Schema Markup Guide: Establish clear licensing signals for crawlers that respect structured data

Server-Level Blocking: When robots.txt Isn't Enough

For publishers who want guaranteed protection rather than polite requests, server configuration provides actual enforcement. These methods return error codes to crawlers rather than hoping they read and obey your robots.txt.

Apache .htaccess Configuration

Add these lines to your .htaccess file to block Meta AI crawlers at the server level:

bash

# Block Meta AI Crawlers via .htaccess
RewriteEngine On

# Block meta-externalagent
RewriteCond %{HTTP_USER_AGENT} meta-externalagent [NC]
RewriteRule .* - [F,L]

# Block FacebookBot
RewriteCond %{HTTP_USER_AGENT} FacebookBot [NC]
RewriteRule .* - [F,L]

# Block Meta-ExternalFetcher
RewriteCond %{HTTP_USER_AGENT} Meta-ExternalFetcher [NC]
RewriteRule .* - [F,L]

This configuration returns a 403 Forbidden error to any request matching these user agents.

Nginx Configuration

For Nginx servers, add this to your server block or a separate configuration file:

bash

# Block Meta AI Crawlers
if ($http_user_agent ~* "(meta-externalagent|FacebookBot|Meta-ExternalFetcher)") {
    return 403;
}

Some administrators prefer a more comprehensive regex pattern that catches variations:

bash

if ($http_user_agent ~* "(meta-externalagent|Meta-ExternalAgent|FacebookBot|Meta-ExternalFetcher|facebookexternalhit)") {
    return 403;
}

Cloudflare Users: The Easy Button

Cloudflare introduced a one-click AI bot blocking feature in July 2024, and over 1 million sites have already enabled it. This approach requires no technical configuration.

Navigate to Security > Bots in your Cloudflare dashboard. Enable "Block AI Scrapers and Crawlers" to automatically block known AI training bots, including Meta's crawlers.

Cloudflare also offers granular controls if you want to block specific bots while allowing others. The platform additionally provides managed robots.txt features that automatically add appropriate directives for AI crawlers.

The Meta-Heavy Publisher Dilemma

Here's where things get tricky for publishers who rely heavily on Facebook and Instagram for traffic. Blocking Meta crawlers entirely could hurt your bottom line more than AI training ever will.

Understanding the Traffic Trade-Off

Consider these factors before implementing any blocks:

Consideration	Impact of Blocking AI Crawlers Only	Impact of Blocking All Meta Crawlers
Link Previews	No change	Broken (bare URLs only)
Social Sharing	No change	Significantly reduced engagement
Referral Traffic	No change	Potentially major decline
Ad Revenue	No change	Revenue drop proportional to traffic loss
AI Training	Prevented	Prevented

Publishers with 20% or more of their traffic coming from Facebook and Instagram should proceed with extreme caution. Blocking facebookexternalhit will hurt your social presence, and that social presence directly feeds your page views and ad revenue. Understanding how to manage and monitor your website ad revenue metrics becomes critical when making decisions that could impact your traffic sources.

The Selective Blocking Strategy

For Meta-dependent publishers, here's a balanced approach:

Block: meta-externalagent, FacebookBot, Meta-ExternalFetcher
Allow: facebookexternalhit

This configuration stops AI training while preserving your social media functionality. Your content won't feed Meta's large language models, but your links will still generate attractive previews when shared.

Rate Limiting: The Middle Ground

If you're experiencing server strain from aggressive Meta crawling but don't want to block entirely, rate limiting offers a compromise. This approach allows Meta's crawlers to access your content at a sustainable pace.

For Nginx, implement rate limiting with these directives:

bash

# Rate limit Meta crawlers
map $http_user_agent $meta_crawler {
    "~*meta-externalagent" 1;
    "~*facebookexternalhit" 1;
    default 0;
}

limit_req_zone $meta_crawler zone=metalimit:10m rate=10r/m;

server {
    location / {
        limit_req zone=metalimit burst=5 nodelay;
        # ... rest of your configuration
    }
}

This configuration limits Meta crawlers to 10 requests per minute with a burst allowance of 5 additional requests.

Visit the AI Blocking resource center.

Verifying Your Blocks Are Working

Implementing blocks is only half the battle. You need to verify they're actually functioning. Use our free AI Crawler Protection Grader to analyze how well your website blocks AI crawlers from scraping your content.

Check Your Server Logs

Review your access logs for Meta user agent strings after implementing blocks. You should see 403 responses for blocked crawlers.

bash

grep -E "(meta-externalagent|FacebookBot)" /var/log/nginx/access.log

Successful blocks will show 403 status codes. Continued 200 responses indicate your blocks aren't working as intended.

Use Facebook's Sharing Debugger

For publishers who want to preserve link previews, test your URLs in Facebook's Sharing Debugger (developers.facebook.com/tools/debug/). If previews generate correctly, facebookexternalhit is still accessing your content as intended.

Monitor Crawl Frequency

Several tools can help you track which bots are accessing your site:

Dark Visitors: Provides analytics on AI crawler activity
Cloudflare Analytics: Shows bot traffic patterns for Cloudflare users
Server log analysis: Direct monitoring of user agent strings

Alternative Strategies for Content Protection

Beyond blocking, publishers have other options for managing AI crawler access. Some publishers are discovering that instead of blocking AI entirely, there's value in getting AI tools to cite your website as an alternative strategy that can actually drive traffic rather than prevent access.

The Meta Tag Approach

Add these experimental meta tags to your page headers. They signal to AI crawlers that you don't want your content used for training:

html

<meta name="robots" content="noai, noimageai">

These tags aren't standardized and compliance is voluntary, but they represent another layer of communication with AI systems.

Strategic Content Structuring

Consider restructuring how you present valuable content. Placing premium insights behind interactions, using more visual content that's harder to scrape, or implementing progressive disclosure can reduce the value of automated scraping. Publishers focused on long-term growth should also consider how to build a content marketing strategy that monetizes your website more effectively while protecting their intellectual property.

Licensing and Legal Frameworks

Some publishers are exploring the TDM Reservation Protocol, which could provide legal frameworks for AI training opt-outs in jurisdictions like the European Union. This approach offers potential legal teeth beyond purely technical solutions. Implementing proper schema markup for your website can also help establish clear licensing signals for crawlers that respect structured data.

Frequently Asked Questions About Blocking Meta AI

What is the difference between facebookexternalhit and meta-externalagent?

facebookexternalhit generates link previews when users share URLs on Facebook, Instagram, and Messenger. meta-externalagent crawls content specifically for AI model training. Blocking facebookexternalhit breaks your social sharing previews. Blocking meta-externalagent stops AI training without affecting social functionality.

Will blocking Meta AI affect my search rankings?

Blocking Meta AI crawlers has no impact on search engine rankings. Search engines like Google use separate crawlers (Googlebot) that are unaffected by Meta-specific blocks. Your SEO remains intact when you block Meta AI.

How do I know if Meta AI is crawling my site?

Check your server access logs for user agent strings containing "meta-externalagent," "FacebookBot," or "Meta-ExternalFetcher." High request volumes from these user agents indicate active AI crawling on your site.

Can I block Meta AI while keeping Facebook link previews?

Yes. Block meta-externalagent, FacebookBot, and Meta-ExternalFetcher while allowing facebookexternalhit. This selective approach stops AI training while preserving attractive link previews when users share your content on Meta platforms.

Maximizing Revenue From the Traffic You Keep

Blocking AI crawlers protects your content, but the ultimate goal remains monetizing your audience effectively. The traffic you retain after implementing crawler blocks is still your primary revenue driver.

Focus on optimizing the user experience for human visitors. High viewability, strategic ad placement, and premium demand partnerships matter far more to your bottom line than AI crawler decisions. Our guide on monetizing your website with ads from basic banners to advanced revenue optimization covers everything you need to maximize revenue from your existing traffic.

Your ad layout, demand stack, and yield optimization all remain critical regardless of your crawler blocking strategy. The readers who arrive via search, direct traffic, and whatever social channels you've preserved are the ones generating your ad revenue. Learn more about maximizing ad revenue through strategic website layout to ensure you're extracting maximum value from every page view.

Next Steps:
AI Crawler Protection Grader: Test how well your website blocks AI crawlers after implementing these configurations
The Ultimate Guide to Monetizing Your Website With Ads: Maximize revenue from the traffic you've protected and retained
Maximizing Ad Revenue Through Strategic Website Layout: Extract maximum value from every page view with optimized ad placement

Make Every Page View Count with Playwire

Your content deserves protection and your traffic deserves optimization. While blocking Meta AI crawlers helps preserve your intellectual property, maximizing revenue from your remaining traffic requires sophisticated yield management.

Playwire's RAMP platform combines AI-driven yield optimization with expert human oversight. Our proprietary algorithms analyze millions of data points to maximize revenue on every impression, while our dedicated yield ops team ensures your ad strategy adapts to the ever-changing digital advertising landscape.

Whether you're preserving 100% of your traffic or navigating the trade-offs of selective crawler blocking, we help publishers extract maximum value from their audience. Our advanced analytics provide the transparency you need to make informed decisions about both your content protection strategy and your monetization approach.

Ready to amplify your ad revenue? Contact Playwire to learn how we can help you earn more from the traffic you've worked hard to build and protect.

Share this article

AI Blocking

Self-Service or Managed Service?

Flex Suite

Get in Touch

How to Block Meta AI From Accessing Your Website Content

Editorial Policy

Ready to be powered by Playwire?

Key Points

What Is Meta AI Crawling and Why Does It Matter for Publishers?

Need a Primer? Read this first:

The Meta Crawler Landscape: Know Your Bots

Meta's Primary Crawlers Explained

How to Block Meta AI With robots.txt

Basic Meta AI Blocking Directives

The Nuclear Option: Block Everything Meta

The robots.txt Trust Problem

Related Content:

Server-Level Blocking: When robots.txt Isn't Enough

Apache .htaccess Configuration

Nginx Configuration

Cloudflare Users: The Easy Button

The Meta-Heavy Publisher Dilemma

Understanding the Traffic Trade-Off

The Selective Blocking Strategy

Rate Limiting: The Middle Ground

Verifying Your Blocks Are Working

Check Your Server Logs

Use Facebook's Sharing Debugger

Monitor Crawl Frequency

Alternative Strategies for Content Protection

The Meta Tag Approach

Strategic Content Structuring

Licensing and Legal Frameworks

Frequently Asked Questions About Blocking Meta AI

What is the difference between facebookexternalhit and meta-externalagent?

Will blocking Meta AI affect my search rankings?

How do I know if Meta AI is crawling my site?

Can I block Meta AI while keeping Facebook link previews?

Maximizing Revenue From the Traffic You Keep

Next Steps:

Make Every Page View Count with Playwire

Related Articles