how toofficial ogbuilds guide

How to block specific AI bots from scraping your website

By ogbuilds, the studio behind robot·guard · updated 2026-06-09

the short answer

To block AI bots, add a User-agent line for each AI crawler (GPTBot, ClaudeBot, CCBot, Google-Extended, PerplexityBot, Bytespider, and others) followed by Disallow: / in your robots.txt; the compliant ones stop crawling, and robot·guard maintains the user-agent list so you don't have to track new ones by hand.

Blocking an AI bot in robots.txt is mechanically simple: you name its user-agent and disallow the paths you want to protect. The hard part is knowing which user-agents to name, keeping that list current as new crawlers launch, and doing it without accidentally blocking the search bots you depend on.

Here is the practical method, the exact rules, and where the manual approach falls down.

GPTBotone of dozens of AI crawler user-agents you'd otherwise track by hand

build your robots.txt →about robot·guard

The rule that blocks an AI crawler

Each block has two parts: the User-agent line naming the bot, and a Disallow line naming what it can't touch. To shut a crawler out of your whole site you write Disallow: / under its user-agent. To block only part of it, disallow a specific path like /articles. The compliant crawler reads its block on its next visit and stops fetching the disallowed paths.

The common AI training and answer-engine user-agents include GPTBot (OpenAI), ClaudeBot and anthropic-ai (Anthropic), CCBot (Common Crawl, the dataset many models train on), Google-Extended (Google's AI training token, separate from Googlebot), PerplexityBot, and Bytespider (ByteDance). Blocking Google-Extended does not affect your Google Search ranking — it is a distinct token precisely so you can opt out of AI training while staying in search.

Why a manager beats a static list

A hand-written blocklist is a snapshot that starts going stale the moment you save it. New AI crawlers appear regularly, each with its own user-agent, and your file silently keeps letting them in until you notice and add them. There is also the everyday risk of a typo blocking the wrong bot.

robot·guard handles both. It keeps a curated, maintained list of known AI scrapers — with who runs each one and what it does — so you toggle them off instead of hunting documentation, and it writes valid directives so you never block Googlebot by accident. You stay in charge of which ones to block; the tool just keeps the menu current and the syntax correct.

how it works

01
list the bots
Identify the AI user-agents you want to block — or open robot·guard's curated blocklist.
02
disallow them
Add User-agent and Disallow: / blocks for each, or toggle them off in the editor.
03
keep search allowed
Leave Googlebot, Bingbot and friends on Allow so your SEO is untouched.
04
publish & revisit
Download the file to your site root, and recheck as new crawlers appear.

frequently asked

Will blocking AI bots stop them completely?

It stops the compliant ones — the major AI crawlers publish a user-agent and honour robots.txt. Bots that ignore the standard need a firewall or rate-limiting rule instead.

Does blocking Google-Extended hurt my search ranking?

No. Google-Extended controls AI training use only; Googlebot handles search. Blocking the former leaves your search presence fully intact.

How do I find new AI crawler user-agents?

Each operator documents its bot, but the list changes often. robot·guard maintains a curated blocklist so you don't have to track every launch.

Can I block AI bots from only part of my site?

Yes. Use Disallow with a specific path (like /blog) under the crawler's user-agent to protect just that section while allowing the rest.

Last updated June 9, 2026

more on robot·guard

robot·guard — smart robots.txt that pays for itself →

part ofAI agent & bot security →

ready to try robot·guard?

build your robots.txt →

How to block specific AI bots from scraping your website

curated ai scraper blocklist

The rule that blocks an AI crawler

Why a manager beats a static list

how it works

list the bots

disallow them

keep search allowed

publish & revisit

frequently asked

more on robot·guard

ready to try robot·guard?