Google-Extended is not Googlebot
Googlebot is the crawler that indexes your pages for Google Search. Google-Extended is a distinct user-agent token that governs whether Google may use already-accessible content to train its generative AI models, including Gemini and the models offered through Vertex AI. Adding a Disallow rule for Google-Extended does not remove you from the index, does not lower your ranking, and does not stop Googlebot from crawling. The two tokens are read and respected separately.
This design is deliberate. Google wanted publishers to have a clean way to say no to AI training without sacrificing search. So when you add a block for Google-Extended, you are sending a narrow, specific request: keep crawling and ranking me normally, but do not use my words to train your AI. That is exactly the kind of granular control robot.guard is built to make easy, since it keeps Googlebot on your whitelist while letting you disallow the training token.
The real tradeoff: training versus answer engines
Here is the part people miss. Google-Extended primarily governs training. The relationship between Google-Extended and features that summarize your content live in search results, like AI Overviews, has shifted over time and is not as clean a lever as a simple on or off. Blocking Google-Extended reliably opts you out of having your content used to train future models, which is the main thing most publishers care about.
So the decision comes down to your goals. If you are protecting original writing, research, or a content business and you do not want it absorbed into model training, blocking Google-Extended is a low-risk move because your search ranking is untouched. If your priority is maximum surface area inside Google's AI products and you are comfortable with your content being used that way, you might leave it open. Either way, you can preview the exact rule in robot.guard before you publish it, so there are no surprises.
How to block Google-Extended safely
Blocking Google-Extended is a two-line addition to robots.txt: name the user-agent, then disallow the whole site for that agent. Because robots.txt is parsed per user-agent, this rule applies only to the training token and leaves your Googlebot rules completely intact. There is no risk of accidentally deindexing yourself as long as you target the right token, which is the kind of mistake a live preview helps you avoid.
Remember that robots.txt is a request that compliant crawlers honour, not an enforced wall. Google is a major, identifiable operator that publicly states it respects these tokens, so for Google-Extended specifically you can trust the block. For crawlers that ignore robots.txt entirely, you would pair the file with a firewall, but that is a separate concern from opting out of Google's AI training.
frequently asked
- Will blocking Google-Extended hurt my Google Search ranking?
- No. Google-Extended and Googlebot are separate user-agents. Disallowing Google-Extended only opts you out of AI training; Googlebot keeps crawling, indexing, and ranking your site exactly as before.
- What does Google-Extended actually control?
- It controls whether Google can use your accessible content to train and improve its generative AI models, including Gemini and models offered through Vertex AI. It is an opt-out signal for AI training, not for search indexing.
- Does blocking Google-Extended remove me from AI Overviews?
- Google-Extended primarily governs model training, and its relationship to live search features like AI Overviews has changed over time. Treat it as a reliable training opt-out rather than a guaranteed switch for in-search AI summaries.
- How do I add the Google-Extended block?
- Add a robots.txt block that names the Google-Extended user-agent and disallows your whole site for it. In robot.guard you toggle it on, preview the exact file, and download it, while Googlebot stays whitelisted.
Last updated June 9, 2026