how to

robots.txt for AI crawlers: GPTBot, ClaudeBot & more

the short answer

AI answer engines crawl the web with named bots — GPTBot and OAI-SearchBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google's AI training/grounding control), and others — and they honor robots.txt. To be eligible for citation in AI answers you must allow these user-agents; to opt out of AI training or grounding you can disallow them specifically. The most common mistake is unintentionally blocking them (or your sitemap) and then wondering why you're never cited.

AI answer engines crawl the web with named bots, and the major ones honor robots.txt. That makes robots.txt the single file that decides whether you're even eligible to be cited in an AI answer.

Knowing each engine's user-agent lets you make a deliberate choice per engine — allow the ones you want, opt out of the ones you don't — instead of an accidental blanket block. Here's how to set it up and how to verify it.

August 2023when OpenAI published GPTBot and its robots.txt token, formalizing AI crawler opt-in/opt-out via the robots standardSource: OpenAI GPTBot documentation, 2023

The AI crawlers you should know

Each engine uses one or more named user-agents. Knowing them lets you make a deliberate choice per engine instead of an accidental blanket block.

GPTBot is OpenAI's training/crawl bot. OAI-SearchBot surfaces sites in ChatGPT search results. ChatGPT-User fetches a page when a user asks ChatGPT about it live. ClaudeBot is Anthropic's crawler (also Claude-Web / anthropic-ai historically). PerplexityBot is Perplexity's indexing crawler. Google-Extended is Google's token to control AI training/grounding, separate from Googlebot.

Allow them explicitly to be citable

If you want AI answer engines to cite you, allow these bots. A simple, permissive robots.txt that allows all user-agents already covers them, but being explicit documents your intent and avoids surprises if you later tighten rules.

Critically, don't block them by accident. A disallow rule meant for one path, an overly broad pattern, or a wildcard left over from a migration can quietly remove you from every AI answer. The practical baseline: allow GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, and Google-Extended to be eligible for AI citations; keep a `User-agent: *` baseline with `Allow: /` unless you have a reason not to; and always reference your sitemap with a `Sitemap:` line. After any robots.txt change, verify it.

Or opt out deliberately

If you don't want your content used for AI training or answers, disallow the specific bots — `User-agent: GPTBot` then `Disallow: /`, and the same for the others. Note that blocking Google-Extended opts you out of Google's AI features without affecting normal Googlebot ranking, so you can keep classic search while opting out of AI.

seo·check reads your robots.txt as part of its audit, so you can confirm AI crawlers aren't blocked — or that your opt-out is actually in effect.

how it works

  1. 01

    Identify the AI user-agents

    Know the named bots: GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, and Google-Extended.

  2. 02

    Decide allow or opt out

    Allow them to be eligible for AI citations, or disallow specific bots to opt out of AI training/grounding.

  3. 03

    Reference your sitemap

    Keep a `User-agent: *` baseline with `Allow: /` and add a `Sitemap:` line so crawlers can discover your URLs.

  4. 04

    Verify the result

    After any change, fetch your robots.txt — or run the page through seo·check — to confirm the AI bots aren't blocked by accident.

frequently asked

If I block GPTBot, do I lose normal Google ranking?
No. GPTBot is OpenAI's bot. Google's regular crawler is Googlebot, and its AI-specific control is Google-Extended. Blocking GPTBot only affects OpenAI; blocking Google-Extended only affects Google's AI features, not standard ranking.
Do all AI crawlers actually respect robots.txt?
The major, named ones (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) publicly state they honor robots.txt. Some smaller or non-compliant scrapers may not, and for those robots.txt is advisory only — network-level blocking is the harder control.
How do I check my robots.txt is set up right?
Fetch yoursite.com/robots.txt and confirm the AI bots aren't disallowed and your sitemap is referenced. seo·check checks robots.txt and sitemap discovery automatically when you run an audit.

Published April 7, 2026 · Last updated June 16, 2026

ready to try seo·check?

check a url