use case

The hidden cost of unwanted bot traffic — and how AI scrapers inflate it

the short answer

Unwanted bots cost you bandwidth, origin compute, cache pollution and skewed analytics; with bots making up roughly half of all web traffic and AI scrapers crawling aggressively for zero return traffic, blocking the worst offenders in robots.txt directly reduces server load and hosting bills.

Bot traffic does not show up as a line item on your invoice, which is exactly why it is easy to overpay for. Every automated request still costs bandwidth, still wakes your origin server, and still consumes the same metered resources a real visitor would — except a scraper might request thousands of pages an hour and buy nothing, read nothing, and send no one your way.

When close to half of all web traffic is automated, the share you are paying to serve bots is not a rounding error. Here is where the cost actually lands, and how to claw it back.

~50%of web traffic is bots — a large slice of what your hosting bill pays to serve (Imperva, 2024)
robotguard.ogbuilds.ai
robot.guard
editorconfigsblocklist
download
18
ai scrapers blocked
~41%
crawl load shed
3
sites guarded
your configs+ new config
main siteexample.com/robots.txt
5 allowed · 11 blockedactive
docsdocs.example.com/robots.txt
6 allowed · 9 blockedactive
shopshop.example.com/robots.txt
4 allowed · 13 blockeddraft

where this happens in the app

the dashboard ties guarding to the thing that matters — scrapers shut out and crawl load shed across every site you manage.

  1. 1what guarding bought back: ai scrapers blocked and the share of crawl load you shed.
  2. 2a saved config per site, each with its own allow / block rule counts.

Where the money goes

The obvious cost is bandwidth: metered egress on a cloud host or CDN is billed per gigabyte, and an aggressive crawler downloading your whole site repeatedly adds up. Less obvious is origin compute — every uncached request a bot makes can spin up a server-rendered page, a database query, and an API call, which on usage-priced platforms is billed by active CPU time.

Then there are the second-order costs. Heavy crawlers pollute your cache with rarely-requested URLs, pushing out content real users need. They inflate your analytics so your traffic graphs lie. And during a crawl spike they compete with humans for the same capacity, slowing the site for the visitors you actually want.

Why AI scrapers changed the maths

Search crawlers have always cost something, but the trade was fair: you served Googlebot, Google sent you visitors. AI training crawlers break that exchange. They can crawl broadly and often to keep datasets fresh, and the return is zero referral traffic — your content goes into a model, not in front of a reader.

Because most AI crawlers honour robots.txt, the cheapest fix is also the simplest: disallow the ones that take the most and give back nothing. You do not need new infrastructure, just a correct, current robots.txt — which is the whole point of managing it deliberately instead of leaving a stale file in place.

What different bots cost you

Search crawlerAI training scraperBad bot
Sends you trafficYesNoNo
Honours robots.txtYesUsuallyNo
Worth allowingYesYour callNever
How to stop itDon'trobots.txtFirewall

frequently asked

How much can blocking bots actually save?
It depends on how heavily you're crawled, but on metered bandwidth and usage-priced compute, trimming aggressive scrapers can noticeably cut egress and origin costs — with no downside if the bots send you no traffic.
Won't a CDN absorb the cost anyway?
A CDN helps with cached static assets, but dynamic pages, cache misses, and CDN egress are still billed. Stopping the request at robots.txt is cheaper than serving it from anywhere.
Do bots really skew analytics?
Yes. Unfiltered bot hits inflate pageviews and distort referrer and geography data, which leads to decisions based on traffic that was never human.
Which bots should I never block?
The crawlers that drive your visibility — Googlebot, Bingbot, and the social bots that build link previews. robot.guard keeps those on the allow list by default.

Last updated June 9, 2026

ready to try robot.guard?

start guarding your site