how to

robots.txt for SEO: how to whitelist Googlebot without locking out the rest

the short answer

For SEO, your robots.txt should explicitly allow search crawlers (Googlebot, Bingbot, DuckDuckBot) to reach your indexable content, avoid disallowing CSS/JS that Google needs to render pages, point to your sitemap, and keep AI-scraper blocks separate so they never touch the search bots.

robots.txt is one of the few files where a single careless line can cost you rankings. Disallow the wrong path and Google stops crawling pages you want indexed; block a resource directory and Google can't render your pages properly. Getting it right is less about clever rules and more about not getting in your own way.

Here is how to keep the crawlers that drive your traffic happy, while still using robots.txt for the job everyone now wants it for: turning away the bots that don't.

Googlebotthe one crawler you can least afford to block by accident

Let the search crawlers do their job

Search engines need to reach two things: the pages you want indexed, and the CSS and JavaScript needed to render them. A classic SEO own-goal is disallowing /assets or /static to save crawl budget, which leaves Googlebot rendering a broken page and judging it accordingly. Allow your content and your render resources; only disallow genuinely private or duplicate paths like internal search results and faceted-filter URLs.

It also helps to declare your sitemap in robots.txt with a Sitemap: line. It is not required, but it gives crawlers a direct map of your indexable URLs, which is especially useful on large or frequently-updated sites.

Keep AI blocks separate from search rules

The most important SEO safety rule when blocking AI scrapers is to scope each block to its own user-agent. Blocking GPTBot or CCBot has no effect on Googlebot because they are different user-agents — but a sloppy wildcard rule meant for AI bots can catch search crawlers too. Keep one explicit block per bot and you keep the two concerns cleanly apart.

This is exactly the kind of mistake a manager prevents. robot.guard writes a separate, correctly-scoped block for each toggle, so allowing search and blocking AI never collide. The live preview shows you precisely which user-agent each Allow and Disallow applies to before you publish.

how it works

  1. 01

    allow your content

    Make sure indexable pages and render resources (CSS/JS) are not disallowed.

  2. 02

    whitelist search bots

    Explicitly allow Googlebot, Bingbot, and the crawlers that send you traffic.

  3. 03

    add your sitemap

    Include a Sitemap: line so crawlers find your URLs directly.

  4. 04

    scope AI blocks

    Give each AI scraper its own user-agent block so search rules stay untouched.

frequently asked

Should I block crawl-budget-wasting paths in robots.txt?
Sparingly. Disallowing infinite faceted URLs or internal search can help large sites, but blocking real content or render resources hurts. When unsure, prefer noindex over disallow.
Does allowing a bot in robots.txt guarantee indexing?
No. robots.txt controls crawling, not indexing. Allowing Googlebot lets it fetch the page; whether it indexes still depends on content quality, noindex tags, and canonicals.
Will blocking AI scrapers lower my rankings?
No, as long as the blocks are scoped to AI user-agents. Search crawlers are separate bots and are unaffected.
Where do I declare my sitemap?
On its own line in robots.txt: Sitemap: https://yoursite.com/sitemap.xml. robot.guard can include it for you when you generate the file.

Last updated June 9, 2026

ready to try robot.guard?

start guarding your site