What goes into a modern robots.txt
Start from what you want found. Allow the search and social crawlers that send you traffic and link previews, make sure your indexable pages and render resources aren't disallowed, and add a Sitemap: line pointing at your sitemap. That covers the discovery half.
Then handle the harvest half. Add explicit blocks for the AI training and answer-engine crawlers you don't want — each scoped to its own user-agent — and any custom rules for paths that shouldn't be crawled at all, like internal search or staging directories. The result is a file that is welcoming and protective at once, instead of all-or-nothing.
Why generate it instead of writing it
Hand-writing robots.txt invites three failure modes: syntax mistakes that silently misbehave, an AI blocklist that goes stale within months, and the all-or-nothing trap where people either allow everything or accidentally block search. A generator removes all three by turning intent into correct directives.
That is what robot.guard does. You pick the good bots and block the scrapers from a curated, maintained list, add custom rules if you need them, and watch the exact file build in a live preview. When it looks right you download it — valid by construction, current by default, and ready to drop at your site root.
how it works
- 01
allow discovery
Whitelist search and social crawlers and add your sitemap line.
- 02
block the scrapers
Add AI-crawler blocks from a maintained list, each scoped to its user-agent.
- 03
add custom rules
Disallow any paths that shouldn't be crawled at all.
- 04
preview & download
Confirm the exact file in the live preview, then download it.
frequently asked
- Can't I just copy a robots.txt template?
- A template gets you started but goes stale and rarely covers current AI crawlers. Generating from a maintained list keeps the AI blocks current and the syntax correct.
- What should every robots.txt include today?
- Allowed search/social crawlers, a sitemap line, scoped blocks for unwanted AI scrapers, and any custom path disallows — nothing that blocks your indexable content or render resources.
- How often should I regenerate it?
- Whenever your site structure changes or new AI crawlers appear. robot.guard's maintained list makes refreshing the AI blocks a quick re-toggle.
- Will a generated file work on any host?
- Yes. robots.txt is host-agnostic plain text — download it and place it at your site root, however you serve static files.
Last updated June 9, 2026