A developer's guide to managing robots.txt rules effectively

The rules that actually decide behaviour

A crawler picks the single most specific User-agent group that matches its name and obeys only that group — it does not merge rules across groups. So if you have a generic User-agent: * block and a specific User-agent: Googlebot block, Googlebot follows its own block exclusively and ignores the wildcard one. Forgetting this is how people accidentally exempt a bot from rules they thought were global.

Within a group, modern crawlers resolve Allow versus Disallow by the longest matching path, not by order. Allow: /blog/public beats Disallow: /blog for a URL under /blog/public. Paths are case-sensitive, a trailing * matches any sequence, and $ anchors the end of a URL. These are well-supported by Google and Bing but not universally, so keep rules simple.

Keep it boring and validated

The safest robots.txt is an unclever one: explicit per-bot groups, minimal wildcards, and no rule you can't explain. Validate every change against a tester before it ships, because there is no runtime error for a bad robots.txt — it just silently changes what gets crawled. Many teams also keep the file in version control so changes are reviewed like any other deploy.

robot·guard fits this workflow by removing the hand-editing class of bugs: each toggle emits a correctly-scoped group, the preview is the exact file, and the curated AI list means you are not pasting user-agents from a dozen docs pages. You download a validated file and commit it, instead of editing live and hoping.

how it works

01
group by user-agent
Write one explicit group per crawler; remember only the most specific group applies.
02
mind precedence
Resolve conflicts with longest-match Allow/Disallow, not line order.
03
validate before deploy
Run the file through a tester — there's no error message for a broken robots.txt.
04
version it
Commit the generated file so robots.txt changes get reviewed like code.

frequently asked

Do all crawlers merge matching user-agent groups?

No. Compliant crawlers obey only the single most specific matching group. Rules in a less-specific group (like *) are ignored once a more specific group matches.

Is Allow or Disallow stronger?

Neither by default — modern crawlers use the longest matching path. A more specific Allow overrides a broader Disallow and vice versa.

Are robots.txt paths case-sensitive?

Yes. /Blog and /blog are different paths. The user-agent token, however, is matched case-insensitively.

Can I comment robots.txt?

Yes, lines starting with # are comments. robot·guard adds a generated-by comment so it's clear the file is managed.

Last updated June 9, 2026

A developer's guide to robots.txt rules that don't bite you later

The rules that actually decide behaviour

Keep it boring and validated

how it works

group by user-agent

mind precedence

validate before deploy

version it

frequently asked

more on robot·guard

ready to try robot·guard?