Should publishers block AI crawlers in their robots.txt file?

There is no universal answer, but a blanket block is best treated as a starting position rather than a permanent strategy. Publishers with premium original content, proprietary data, or vertical expertise have negotiating leverage and benefit from blocking first, then permitting selectively. Smaller publishers without that bargaining power may not be able to secure licensing deals regardless, in which case the strategic priority shifts to maximizing revenue from existing traffic rather than holding out for compensation that may not materialize.

What is the difference between AI training crawlers and AI grounding crawlers?

Training crawlers pull from a broad archive of published content to build foundational model knowledge. Grounding crawlers retrieve current, trusted content in real time using model context protocol (MCP) connections to inform a live AI inference response. Publisher compensation programs like Microsoft's Publisher Content Marketplace focus on grounding: publishers get paid each time their content is used in a live inference, not for contributing to a model's historical training data. This distinction matters because grounding creates a recurring, transaction-based revenue relationship rather than a one-time or unpaid contribution.

What is Microsoft's Publisher Content Marketplace and how does it compensate publishers?

Microsoft's Publisher Content Marketplace, announced in February 2025, is a licensing program that pays publishers when their content informs an AI inference response. Microsoft handles licensing agreements and runs the compute on Azure. As of the time of reporting, eight publishers had signed on, with the program aimed at eventually covering the open web. Publishers in the program receive payment on a per-inference basis tied to grounding use, not training data contribution.

Which AI crawlers should publishers never block?

Publishers should not block search indexing crawlers like Googlebot or Bingbot, as doing so directly harms organic search visibility and referral traffic. Licensing-eligible AI crawlers connected to platforms with active or developing publisher compensation programs are worth evaluating case by case rather than blocking wholesale. The crawlers worth blocking are unauthorized scrapers that offer no compensation model, no licensing relationship, and no discoverable benefit to the publisher.

How does blocking AI crawlers give publishers negotiating leverage?

When a publisher blocks a crawler, the AI platform cannot access that content without a licensing agreement. For publishers with content AI companies actively want, such as original reporting, vertical expertise, or proprietary data, this creates a supply constraint that shifts bargaining power toward the publisher. Granting open access removes that leverage entirely, effectively supplying the model for free. The blocking-first approach works best for premium publishers whose content is differentiated enough that AI platforms have an incentive to negotiate rather than simply move on to the next available source.

What can publishers do to make their content more likely to be cited by AI tools?

Content structure is the primary lever. Clear document hierarchy, authoritative sourcing, and direct answers to specific questions all increase the probability that AI tools surface your content as a cited source rather than absorbing it without attribution. This applies regardless of a publisher's crawler access policy: even publishers granting limited access benefit from structuring content in ways that AI grounding systems can parse and attribute. Publishers should also monitor which crawlers are actively hitting their sites, since the volume and type of bot traffic has changed significantly as AI systems have scaled.

Learning Center

Should Publishers Let AI Bots Crawl Their Sites?

Playwire Strategy Team

May 27, 2026

Show Editorial Policy

Editorial Policy

All of our content is generated by subject matter experts with years of ad tech experience and structured by writers and educators for ease of use and digestibility. Learn more about our rigorous interview, content production and review process here.

Ad Revenue Optimization AI Content Licensing AI Crawler Policy Referral Traffic Publisher Monetization Strategy

Should Publishers Let AI Bots Crawl Their Sites?

Ready to be powered by Playwire?

Maximize your ad revenue today!

Apply Now

Key Points
Microsoft's VP of publisher product publicly advised publishers to stop blocking AI crawlers and optimize content for AI discovery, but a publisher already in Microsoft's licensing program pushed back with a more nuanced position.
The "block or not" debate misses the real strategic question: which crawlers get access, on what terms, and what do you get in return.
Microsoft's Publisher Content Marketplace pays publishers when their content informs an AI response, but the program currently has only eight publishers and aims to eventually cover the entire open web.
Blocking everything first gives publishers negotiating leverage. Opening everything up gives AI companies free inventory.
Whatever you decide on crawlers, the traffic you still own needs to work harder. That's where monetization strategy matters most.

See It In Action:
Our Publishers Are Partners, Not Just Customers: How Playwire approaches publisher relationships and why it translates to better revenue outcomes
Publisher Earnings Index: Real earnings data from publishers across verticals to benchmark your own monetization performance
News Publisher Guide: A practical guide to ad monetization for news publishers, from setup to optimization

What Happened

According to AdExchanger's coverage of the Programmatic AI event in Las Vegas, Nikhil Kolar, VP of publisher product at Microsoft AI, told publishers they should open their sites to AI crawlers. His argument: if your content isn't legible to AI agents, your business isn't discoverable. Four out of five websites currently block AI bots, per Kolar. That means most publishers are effectively invisible to AI-driven recommendations and discovery.

Kolar also pointed to Microsoft's Publisher Content Marketplace, announced in February, as a path toward fair compensation. The model pays publishers when their content informs an AI inference. Microsoft handles the licensing agreements and runs the compute on Azure, which means Microsoft makes money on the cloud side regardless. Currently, eight publishers have signed on.

Jonathan Roberts, Chief Innovation Officer at People Inc. and one of those eight publishers, offered a different read. People Inc. blocks 30,000 to 35,000 crawlers per day, granting access to only 38. Roberts framed blocking as a control mechanism, not a wall: you block first, then permission selectively, and negotiate from a position of strength.

His actual disagreement with Kolar was narrower than it appeared. Kolar's advice about opening up access applies more to retail and merchant sites that want AI chatbots recommending their products. For content publishers with valuable IP, the calculus is different.

Essential Background Reading:
AI and Publishers Resource Center: The full picture on how AI is reshaping publisher strategy, revenue, and content discovery
AI Crawler Resource Center for Publishers: Everything publishers need to know about AI crawlers, blocking decisions, and access control
Block AI: A publisher-focused guide to deciding when and how to block AI crawlers from your site
AI Info: Core AI concepts and their implications for digital publishers and ad monetization

Why This Matters for Publishers

The underlying tension here isn't really about blocking. It's about who captures value from your content in the AI era.

Kolar shared a telling data point from 430 meetings Microsoft AI held with publishers last year: the most common sentiment was a feeling of powerlessness. Publishers watched social, mobile, and search reshape their traffic without being able to shape those shifts. AI feels like the same pattern repeating.

Free access feeds the models that answer user questions directly, cutting out the click-through to your site. Block everything, and you lose discoverability entirely. Neither extreme is a complete strategy.

The distinction Kolar drew between "training" and "grounding" is worth understanding. Training pulls from the broad archive of published content to build foundational model knowledge. Grounding pulls from current, trusted sources in real time using model context protocol (MCP) connections. Microsoft's marketplace focuses on grounding, which means publishers participating get paid each time their content is used in a live inference, not just when it contributes to a model's training data. That's a meaningfully different economic relationship.

Related Content:
Generative AI and Publishers: How generative AI models are being built on publisher content and what the economics look like
AI Content Info: What happens to your content once AI crawlers get hold of it, and how to manage your exposure
AI Crawler Protection Grader: Assess your current crawler protection posture and identify gaps before they cost you
Publisher Ad Tech Stack: How your monetization infrastructure fits together and why it matters more as AI disrupts traffic patterns

What Publishers Should Do

There's no universal right answer here, but there is a decision framework worth applying.

Start by understanding what you're blocking and why. A blanket robots.txt block on all AI crawlers is easy to implement and gives you a clean baseline. It's a starting position, not a permanent strategy.

Publishers should evaluate crawlers by category:

Search indexers: essential. Blocking Googlebot or Bingbot hurts your organic traffic. Don't do it.
Licensing-eligible AI crawlers: these are crawlers connected to platforms that have or are building publisher compensation programs. Worth evaluating case by case.
Unauthorized scrapers: block these. They offer nothing and take everything.

The leverage question matters, too. Roberts' point about negotiating from a blocking position applies most to publishers with content that AI companies actively want. If you're a premium publisher with original reporting, vertical expertise, or proprietary data, you have something worth licensing. Block first, negotiate second.

Smaller publishers without that bargaining power face a different reality. Roberts acknowledged the leverage dynamic weakens considerably at that scale. You can still control access, but a licensing deal may not materialize. In that case, the strategic priority shifts: maximize revenue from the traffic you do have.

One move benefits nearly every publisher regardless of where they land on the blocking debate: optimizing content structure for AI legibility. Clear structure, authoritative sourcing, and direct answers to specific questions all increase the likelihood that AI tools surface your content as a source rather than just absorbing it silently.

Next Steps:
Publisher Ad Revenue Maturity Model: Understand where your monetization strategy stands today and what it takes to reach the next level
Publisher Ad Revenue Maturity Model Assessment: Take the assessment to get a personalized read on your revenue operation's current state
Yield Experiment Playbook: A practical framework for running monetization experiments and improving revenue per session
News Publishers Ad Revenue Resource Center: Revenue strategies tailored specifically for news publishers navigating traffic volatility

Our Take

The "block or don't block" framing is a distraction. The real question is whether you're managing your content's AI exposure deliberately or just letting it happen to you.

Kolar put it directly: publishers have more power than they think. But that power requires using it. Passive openness is not a strategy. Neither is reflexive blocking.

Whatever your crawling policy, the traffic you still own and control deserves serious monetization infrastructure. AI is reshaping discovery, but publishers who invest in squeezing maximum revenue per session from their existing audience are building something that doesn't depend on what any platform decides next.

We work with publishers across gaming, education, news, and entertainment to do exactly that. If you want to see what your current traffic is actually worth, our AI Crawler Protection Grader is a good place to start assessing your exposure, and our AI crawler resource center lays out the full decision framework for publishers navigating this.

You've built the audience. Don't leave revenue on the table while you figure out the rest.

Share this article

Ad Revenue Optimization AI Content Licensing AI Crawler Policy Referral Traffic Publisher Monetization Strategy

Self-Service or Managed Service?

Flex Suite

Get in Touch

Should Publishers Let AI Bots Crawl Their Sites?

Editorial Policy

Ready to be powered by Playwire?

Key Points

See It In Action:

What Happened

Essential Background Reading:

Why This Matters for Publishers

Related Content:

What Publishers Should Do

Next Steps:

Our Take

Related Articles