Where the money goes
The obvious cost is bandwidth: metered egress on a cloud host or CDN is billed per gigabyte, and an aggressive crawler downloading your whole site repeatedly adds up. Less obvious is origin compute — every uncached request a bot makes can spin up a server-rendered page, a database query, and an API call, which on usage-priced platforms is billed by active CPU time.
Then there are the second-order costs. Heavy crawlers pollute your cache with rarely-requested URLs, pushing out content real users need. They inflate your analytics so your traffic graphs lie. And during a crawl spike they compete with humans for the same capacity, slowing the site for the visitors you actually want.
Why AI scrapers changed the maths
Search crawlers have always cost something, but the trade was fair: you served Googlebot, Google sent you visitors. AI training crawlers break that exchange. They can crawl broadly and often to keep datasets fresh, and the return is zero referral traffic — your content goes into a model, not in front of a reader.
Because most AI crawlers honour robots.txt, the cheapest fix is also the simplest: disallow the ones that take the most and give back nothing. You do not need new infrastructure, just a correct, current robots.txt — which is the whole point of managing it deliberately instead of leaving a stale file in place.
What different bots cost you
| Search crawler | AI training scraper | Bad bot | |
|---|---|---|---|
| Sends you traffic | Yes | No | No |
| Honours robots.txt | Yes | Usually | No |
| Worth allowing | Yes | Your call | Never |
| How to stop it | Don't | robots.txt | Firewall |
frequently asked
- How much can blocking bots actually save?
- It depends on how heavily you're crawled, but on metered bandwidth and usage-priced compute, trimming aggressive scrapers can noticeably cut egress and origin costs — with no downside if the bots send you no traffic.
- Won't a CDN absorb the cost anyway?
- A CDN helps with cached static assets, but dynamic pages, cache misses, and CDN egress are still billed. Stopping the request at robots.txt is cheaper than serving it from anywhere.
- Do bots really skew analytics?
- Yes. Unfiltered bot hits inflate pageviews and distort referrer and geography data, which leads to decisions based on traffic that was never human.
- Which bots should I never block?
- The crawlers that drive your visibility — Googlebot, Bingbot, and the social bots that build link previews. robot.guard keeps those on the allow list by default.
Last updated June 9, 2026