The cost of unwanted bot traffic: how AI scrapers hit your budget

The hidden cost of unwanted bot traffic — and how AI scrapers inflate it

By ogbuilds, the studio behind robot·guard · updated 2026-06-09

the short answer

Unwanted bots cost you bandwidth, origin compute, cache pollution and skewed analytics; with bots making up roughly half of all web traffic and AI scrapers crawling aggressively for zero return traffic, blocking the worst offenders in robots.txt directly reduces server load and hosting bills.

Bot traffic does not show up as a line item on your invoice, which is exactly why it is easy to overpay for. Every automated request still costs bandwidth, still wakes your origin server, and still consumes the same metered resources a real visitor would — except a scraper might request thousands of pages an hour and buy nothing, read nothing, and send no one your way.

When close to half of all web traffic is automated, the share you are paying to serve bots is not a rounding error. Here is where the cost actually lands, and how to claw it back.

~49.6%of web traffic in 2023 was bots — a large slice of what your hosting bill pays to serveSource: Imperva, 2024 Bad Bot Report

Where the money goes

The obvious cost is bandwidth: metered egress on a cloud host or CDN is billed per gigabyte, and an aggressive crawler downloading your whole site repeatedly adds up. Less obvious is origin compute — every uncached request a bot makes can spin up a server-rendered page, a database query, and an API call, which on usage-priced platforms is billed by active CPU time.

Then there are the second-order costs. Heavy crawlers pollute your cache with rarely-requested URLs, pushing out content real users need. They inflate your analytics so your traffic graphs lie. And during a crawl spike they compete with humans for the same capacity, slowing the site for the visitors you actually want.

Why AI scrapers changed the maths

Search crawlers have always cost something, but the trade was fair: you served Googlebot, Google sent you visitors. AI training crawlers break that exchange. They can crawl broadly and often to keep datasets fresh, and the return is zero referral traffic — your content goes into a model, not in front of a reader.

Because most AI crawlers honour robots.txt, the cheapest fix is also the simplest: disallow the ones that take the most and give back nothing. You do not need new infrastructure, just a correct, current robots.txt — which is the whole point of managing it deliberately instead of leaving a stale file in place.

What different bots cost you

	Search crawler	AI training scraper	Bad bot
Sends you traffic	Yes	No	No
Honours robots.txt	Yes	Usually	No
Worth allowing	Yes	Your call	Never
How to stop it	Don't	robots.txt	Firewall

frequently asked

How much can blocking bots actually save?

It depends on how heavily you're crawled, but on metered bandwidth and usage-priced compute, trimming aggressive scrapers can noticeably cut egress and origin costs — with no downside if the bots send you no traffic.

Won't a CDN absorb the cost anyway?

A CDN helps with cached static assets, but dynamic pages, cache misses, and CDN egress are still billed. Stopping the request at robots.txt is cheaper than serving it from anywhere.

Do bots really skew analytics?

Yes. Unfiltered bot hits inflate pageviews and distort referrer and geography data, which leads to decisions based on traffic that was never human.

Which bots should I never block?

The crawlers that drive your visibility — Googlebot, Bingbot, and the social bots that build link previews. robot·guard keeps those on the allow list by default.

Last updated June 9, 2026