The rule that blocks an AI crawler
Each block has two parts: the User-agent line naming the bot, and a Disallow line naming what it can't touch. To shut a crawler out of your whole site you write Disallow: / under its user-agent. To block only part of it, disallow a specific path like /articles. The compliant crawler reads its block on its next visit and stops fetching the disallowed paths.
The common AI training and answer-engine user-agents include GPTBot (OpenAI), ClaudeBot and anthropic-ai (Anthropic), CCBot (Common Crawl, the dataset many models train on), Google-Extended (Google's AI training token, separate from Googlebot), PerplexityBot, and Bytespider (ByteDance). Blocking Google-Extended does not affect your Google Search ranking — it is a distinct token precisely so you can opt out of AI training while staying in search.
Why a manager beats a static list
A hand-written blocklist is a snapshot that starts going stale the moment you save it. New AI crawlers appear regularly, each with its own user-agent, and your file silently keeps letting them in until you notice and add them. There is also the everyday risk of a typo blocking the wrong bot.
robot.guard handles both. It keeps a curated, maintained list of known AI scrapers — with who runs each one and what it does — so you toggle them off instead of hunting documentation, and it writes valid directives so you never block Googlebot by accident. You stay in charge of which ones to block; the tool just keeps the menu current and the syntax correct.
how it works
- 01
list the bots
Identify the AI user-agents you want to block — or open robot.guard's curated blocklist.
- 02
disallow them
Add User-agent and Disallow: / blocks for each, or toggle them off in the editor.
- 03
keep search allowed
Leave Googlebot, Bingbot and friends on Allow so your SEO is untouched.
- 04
publish & revisit
Download the file to your site root, and recheck as new crawlers appear.
frequently asked
- Will blocking AI bots stop them completely?
- It stops the compliant ones — the major AI crawlers publish a user-agent and honour robots.txt. Bots that ignore the standard need a firewall or rate-limiting rule instead.
- Does blocking Google-Extended hurt my search ranking?
- No. Google-Extended controls AI training use only; Googlebot handles search. Blocking the former leaves your search presence fully intact.
- How do I find new AI crawler user-agents?
- Each operator documents its bot, but the list changes often. robot.guard maintains a curated blocklist so you don't have to track every launch.
- Can I block AI bots from only part of my site?
- Yes. Use Disallow with a specific path (like /blog) under the crawler's user-agent to protect just that section while allowing the rest.
Last updated June 9, 2026