The AI crawlers you should know
Each engine uses one or more named user-agents. Knowing them lets you make a deliberate choice per engine instead of an accidental blanket block.
GPTBot is OpenAI's training/crawl bot. OAI-SearchBot surfaces sites in ChatGPT search results. ChatGPT-User fetches a page when a user asks ChatGPT about it live. ClaudeBot is Anthropic's crawler (also Claude-Web / anthropic-ai historically). PerplexityBot is Perplexity's indexing crawler. Google-Extended is Google's token to control AI training/grounding, separate from Googlebot.
Allow them explicitly to be citable
If you want AI answer engines to cite you, allow these bots. A simple, permissive robots.txt that allows all user-agents already covers them, but being explicit documents your intent and avoids surprises if you later tighten rules.
Critically, don't block them by accident. A disallow rule meant for one path, an overly broad pattern, or a wildcard left over from a migration can quietly remove you from every AI answer. The practical baseline: allow GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, and Google-Extended to be eligible for AI citations; keep a `User-agent: *` baseline with `Allow: /` unless you have a reason not to; and always reference your sitemap with a `Sitemap:` line. After any robots.txt change, verify it.
Or opt out deliberately
If you don't want your content used for AI training or answers, disallow the specific bots — `User-agent: GPTBot` then `Disallow: /`, and the same for the others. Note that blocking Google-Extended opts you out of Google's AI features without affecting normal Googlebot ranking, so you can keep classic search while opting out of AI.
seo·check reads your robots.txt as part of its audit, so you can confirm AI crawlers aren't blocked — or that your opt-out is actually in effect.
how it works
- 01
Identify the AI user-agents
Know the named bots: GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, and Google-Extended.
- 02
Decide allow or opt out
Allow them to be eligible for AI citations, or disallow specific bots to opt out of AI training/grounding.
- 03
Reference your sitemap
Keep a `User-agent: *` baseline with `Allow: /` and add a `Sitemap:` line so crawlers can discover your URLs.
- 04
Verify the result
After any change, fetch your robots.txt — or run the page through seo·check — to confirm the AI bots aren't blocked by accident.
frequently asked
- If I block GPTBot, do I lose normal Google ranking?
- No. GPTBot is OpenAI's bot. Google's regular crawler is Googlebot, and its AI-specific control is Google-Extended. Blocking GPTBot only affects OpenAI; blocking Google-Extended only affects Google's AI features, not standard ranking.
- Do all AI crawlers actually respect robots.txt?
- The major, named ones (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) publicly state they honor robots.txt. Some smaller or non-compliant scrapers may not, and for those robots.txt is advisory only — network-level blocking is the harder control.
- How do I check my robots.txt is set up right?
- Fetch yoursite.com/robots.txt and confirm the AI bots aren't disallowed and your sitemap is referenced. seo·check checks robots.txt and sitemap discovery automatically when you run an audit.
Published April 7, 2026 · Last updated June 16, 2026