how to

How to block specific AI bots from scraping your website

the short answer

To block AI bots, add a User-agent line for each AI crawler (GPTBot, ClaudeBot, CCBot, Google-Extended, PerplexityBot, Bytespider, and others) followed by Disallow: / in your robots.txt; the compliant ones stop crawling, and robot.guard maintains the user-agent list so you don't have to track new ones by hand.

Blocking an AI bot in robots.txt is mechanically simple: you name its user-agent and disallow the paths you want to protect. The hard part is knowing which user-agents to name, keeping that list current as new crawlers launch, and doing it without accidentally blocking the search bots you depend on.

Here is the practical method, the exact rules, and where the manual approach falls down.

GPTBotone of dozens of AI crawler user-agents you'd otherwise track by hand
robotguard.ogbuilds.ai
robot.guard
editorconfigsblocklist
download

curated ai scraper blocklist

kept current as new crawlers appear — toggle the ones you want shut out.

5 of 6 blocked
user-agentoperatorpurposeblock
GPTBotOpenAImodel training
ClaudeBotAnthropicmodel training
CCBotCommon Crawltraining dataset
Google-ExtendedGooglegemini training
BytespiderByteDancemodel training
PerplexityBotPerplexityanswer engine
tiprobots.txt is a polite request — pair it with a firewall rule for crawlers that ignore it.

where this happens in the app

robot.guard keeps a curated, maintained list of ai crawler user-agents — who runs each one and what it's for — so you block them with a toggle instead of hunting documentation.

  1. 1each ai crawler by user-agent — gptbot, claudebot, ccbot, google-extended — with its operator and purpose.
  2. 2one toggle writes the disallow block; the list stays current as new crawlers launch.

The rule that blocks an AI crawler

Each block has two parts: the User-agent line naming the bot, and a Disallow line naming what it can't touch. To shut a crawler out of your whole site you write Disallow: / under its user-agent. To block only part of it, disallow a specific path like /articles. The compliant crawler reads its block on its next visit and stops fetching the disallowed paths.

The common AI training and answer-engine user-agents include GPTBot (OpenAI), ClaudeBot and anthropic-ai (Anthropic), CCBot (Common Crawl, the dataset many models train on), Google-Extended (Google's AI training token, separate from Googlebot), PerplexityBot, and Bytespider (ByteDance). Blocking Google-Extended does not affect your Google Search ranking — it is a distinct token precisely so you can opt out of AI training while staying in search.

Why a manager beats a static list

A hand-written blocklist is a snapshot that starts going stale the moment you save it. New AI crawlers appear regularly, each with its own user-agent, and your file silently keeps letting them in until you notice and add them. There is also the everyday risk of a typo blocking the wrong bot.

robot.guard handles both. It keeps a curated, maintained list of known AI scrapers — with who runs each one and what it does — so you toggle them off instead of hunting documentation, and it writes valid directives so you never block Googlebot by accident. You stay in charge of which ones to block; the tool just keeps the menu current and the syntax correct.

how it works

  1. 01

    list the bots

    Identify the AI user-agents you want to block — or open robot.guard's curated blocklist.

  2. 02

    disallow them

    Add User-agent and Disallow: / blocks for each, or toggle them off in the editor.

  3. 03

    keep search allowed

    Leave Googlebot, Bingbot and friends on Allow so your SEO is untouched.

  4. 04

    publish & revisit

    Download the file to your site root, and recheck as new crawlers appear.

frequently asked

Will blocking AI bots stop them completely?
It stops the compliant ones — the major AI crawlers publish a user-agent and honour robots.txt. Bots that ignore the standard need a firewall or rate-limiting rule instead.
Does blocking Google-Extended hurt my search ranking?
No. Google-Extended controls AI training use only; Googlebot handles search. Blocking the former leaves your search presence fully intact.
How do I find new AI crawler user-agents?
Each operator documents its bot, but the list changes often. robot.guard maintains a curated blocklist so you don't have to track every launch.
Can I block AI bots from only part of my site?
Yes. Use Disallow with a specific path (like /blog) under the crawler's user-agent to protect just that section while allowing the rest.

Last updated June 9, 2026

ready to try robot.guard?

start guarding your site