Learning Center

AI Scrapers Are the Ad Tech Tax, But Worse

May 5, 2026

Show Editorial Policy

shield-icon-2

Editorial Policy

All of our content is generated by subject matter experts with years of ad tech experience and structured by writers and educators for ease of use and digestibility. Learn more about our rigorous interview, content production and review process here.

AI Scrapers Are the Ad Tech Tax, But Worse
Ready to be powered by Playwire?

Maximize your ad revenue today!

Apply Now

Key Points

  • AI data brokers and content scrapers are extracting 100% of publisher content value while paying nothing back, making the ad tech tax look minor by comparison.
  • The scraper economy is estimated at $1 billion and growing, with at least 21 identified vendors rebranding as "agentic infrastructure" to obscure what they're doing.
  • Publishers face a whack-a-mole problem: locking down their own domains doesn't stop their syndicated content from being scraped through third-party portals.
  • The legal and financial responsibility is being shifted downstream, leaving publishers holding the bag while AI companies point fingers at portal settings.
  • Publishers who can't stop the traffic bleed need to squeeze maximum revenue from every session they still own.

What Happened

Digiday's reporting on the emerging AI scraper economy landed a quote that should be pinned to every publisher's wall: "30, 40, 50 startup DSPs for content, but they're taking a 100% fee."

One anonymous publishing exec compared the new crop of AI data brokers to DSPs for content. The difference from the ad tech middleman problem publishers spent years complaining about? At least DSPs moved money, even if they took a cut. These scrapers take everything and leave nothing.

Chris Dicker, CEO of Candr Media, called it plainly: "It's not a tax, it's a hostile takeover funded by our own IP." Media analyst Matthew Scott Goldstein's reporting puts the scraper economy at $1 billion, citing Mordor Intelligence data. He identified 21 vendors operating in this space, including Firecrawl, Exa, Tavily, Perplexity Sonar, and Bright Data. TollBit's index puts the count closer to 40.

The tactical playbook these companies use ranges from stealth crawlers that ignore robots.txt directives to public announcements that they simply won't comply. Dicker's framing: "If the message is 'no crawl,' then they need to remember that no means no."

See It In Action:

  • Google AdX Integration: How publishers can maximize demand competition and CPMs through premium exchange access on sessions they retain
  • Blockthrough: Ad recovery technology that helps publishers recapture revenue from users who would otherwise generate zero impressions
  • Hadron ID: Identity solutions that help publishers maintain audience addressability and CPM premiums as traffic patterns shift

Why This Matters

The ad tech tax debate was always about opacity and proportionality. Publishers grudgingly accepted that intermediaries would take a slice, but they wanted to know the size of the slice and what they got for it. The scraper economy doesn't even pretend to offer that deal.

What makes the current situation worse is the rebranding layer. Goldstein flagged that scraper companies are now positioning themselves as "agentic infrastructure." The technology pitch gets cleaner. The underlying economics stay the same: extract content at scale, pay nothing, build competing products with the IP you just took.

The syndication trap compounds it further. Publishers who lock down their own domains still find their content appearing on third-party portals that carry their feeds. When they push back against AI firms over that scraped content, the response is reliably the same: "Talk to the portal about their settings." Responsibility gets shuffled until no one is holding it.

This mirrors what happened to the music industry before licensing frameworks existed. The Napster comparison Digiday's sources made is apt. Publishers are still waiting for their iTunes moment, and the pirates are moving faster.

Essential Background Reading:

  • Publisher Ad Tech Stack: Understand how modern publisher ad tech stacks are structured before diving into how scrapers disrupt them
  • Advertising Terms Glossary: Core ad tech terminology referenced throughout this article, from DSPs to CPMs to robots.txt
  • AI in Ad Tech: Overview of how AI is being applied across the ad tech ecosystem, both for publishers and against them
  • Playwire Learning Center: Publisher education resources covering monetization strategy, yield optimization, and ad tech fundamentals

What Publishers Should Do

The legal and technical picture here is complicated, and no single tactic closes every gap. That said, publishers do have levers to pull.

The most immediate moves worth evaluating:

  • robots.txt and blocking tools: Basic hygiene, but insufficient alone. Some crawlers ignore directives entirely. Use them anyway and document violations.
  • Legal and licensing pressure: Several publishers are pursuing litigation. It's slow, but licensing precedents, when they arrive, tend to move the whole industry.
  • Crawler monitoring: Tools like our AI Crawler Protection Grader help publishers identify which bots are actually hitting their properties and how well current blocking is holding.
  • Syndication contract review: If your content lives on third-party portals, your distribution agreements may need AI crawling restrictions written in explicitly.
  • Revenue-per-session focus: Traffic you can't defend, you lose. Traffic you still have, you optimize. More on this below.

The whack-a-mole reality means no publisher will achieve perfect protection in the near term. The strategic question shifts: how do you protect what you can, and maximize the value of what remains?

Threat VectorCurrent ExposureMitigation Lever
Direct domain crawlingHighrobots.txt, bot blocking, legal action
Syndicated content scrapingHighPortal contract updates, monitoring
Stealth/undeclared crawlersMedium-HighCrawler detection tools, traffic analysis
Competing AI products built on your IPLong-termLicensing frameworks, litigation

Related Content:

What Publishers Can Control Right Now

The scraper problem is real and legal resolution is years away. What isn't years away is the impact on traffic and revenue. Publishers losing sessions to AI-generated answers need their remaining traffic to work harder.

That means RPS, not just CPMs. Revenue per session accounts for ad layout, format mix, viewability, fill rates, and demand competition all at once. Squeezing an extra few percentage points out of each session compounds fast when organic traffic is under structural pressure.

We work with publishers across gaming, education, news, and entertainment who are navigating exactly this environment. The ones doing it well aren't just defending against scrapers. They're running tighter yield operations on the sessions they still own, and treating every user who shows up as worth significantly more than they did two years ago.

Next Steps:

  • Build a Stronger Ad Tech Stack: Evaluate your current stack architecture and identify where revenue is leaking beyond just scraper exposure
  • RAMP Self-Service Platform: Take control of your monetization with a self-service platform built for publishers who want visibility and performance
  • Google Ad Manager Resources: Optimize your GAM setup to maximize CPMs and RPS from every session your traffic defenses successfully retain
  • Fraudlogix Integration: Invalid traffic and bot detection tools that complement your AI crawler blocking strategy
  • Pixalate: Ad fraud prevention and traffic quality analysis to help distinguish legitimate sessions from bot-driven ones

Our Perspective

The ad tech tax was a legitimate grievance. Publishers knew the game, even when they didn't love the rules. The scraper economy is a different category of problem: no rules, no compensation, no reciprocity.

We're not going to pretend there's a clean technical fix for a problem that is fundamentally legal and structural. What we can do is help publishers get more out of the audience they still have, with the transparency to see exactly where that value is coming from.

If you want to see how your current crawler exposure looks, start with our AI Crawler Protection Grader. And if the broader resource context is useful, our AI crawler resource center for publishers covers the full picture.

New call-to-action