robots.txt for SEO: whitelisting Googlebot and the crawlers that matter

Let the search crawlers do their job

Search engines need to reach two things: the pages you want indexed, and the CSS and JavaScript needed to render them. A classic SEO own-goal is disallowing /assets or /static to save crawl budget, which leaves Googlebot rendering a broken page and judging it accordingly. Allow your content and your render resources; only disallow genuinely private or duplicate paths like internal search results and faceted-filter URLs.

It also helps to declare your sitemap in robots.txt with a Sitemap: line. It is not required, but it gives crawlers a direct map of your indexable URLs, which is especially useful on large or frequently-updated sites.

Keep AI blocks separate from search rules

The most important SEO safety rule when blocking AI scrapers is to scope each block to its own user-agent. Blocking GPTBot or CCBot has no effect on Googlebot because they are different user-agents — but a sloppy wildcard rule meant for AI bots can catch search crawlers too. Keep one explicit block per bot and you keep the two concerns cleanly apart.

This is exactly the kind of mistake a manager prevents. robot·guard writes a separate, correctly-scoped block for each toggle, so allowing search and blocking AI never collide. The live preview shows you precisely which user-agent each Allow and Disallow applies to before you publish.

how it works

01
allow your content
Make sure indexable pages and render resources (CSS/JS) are not disallowed.
02
whitelist search bots
Explicitly allow Googlebot, Bingbot, and the crawlers that send you traffic.
03
add your sitemap
Include a Sitemap: line so crawlers find your URLs directly.
04
scope AI blocks
Give each AI scraper its own user-agent block so search rules stay untouched.

frequently asked

Should I block crawl-budget-wasting paths in robots.txt?

Sparingly. Disallowing infinite faceted URLs or internal search can help large sites, but blocking real content or render resources hurts. When unsure, prefer noindex over disallow.

Does allowing a bot in robots.txt guarantee indexing?

No. robots.txt controls crawling, not indexing. Allowing Googlebot lets it fetch the page; whether it indexes still depends on content quality, noindex tags, and canonicals.

Will blocking AI scrapers lower my rankings?

No, as long as the blocks are scoped to AI user-agents. Search crawlers are separate bots and are unaffected.

Where do I declare my sitemap?

On its own line in robots.txt: Sitemap: https://yoursite.com/sitemap.xml. robot·guard can include it for you when you generate the file.

Last updated June 9, 2026

robots.txt for SEO: how to whitelist Googlebot without locking out the rest

Let the search crawlers do their job

Keep AI blocks separate from search rules

how it works

allow your content

whitelist search bots

add your sitemap

scope AI blocks

frequently asked

more on robot·guard

ready to try robot·guard?