use case

What is Google-Extended (and should you block it)?

the short answer

Google-Extended is a robots.txt token that controls whether Google can use your content to train its AI models like Gemini, and blocking it has no effect on your Google Search ranking because Googlebot is a completely separate user-agent.

If you have looked at your robots.txt lately, you may have noticed people talking about a user-agent called Google-Extended and wondered whether blocking it would tank your search traffic. The short answer is no. Google-Extended is not Googlebot. It is a separate control token Google introduced so site owners can decide whether their content gets used to train and improve Google's generative AI products, independent of how the site is crawled and ranked for regular search.

That separation is the whole point, and it is good news. It means you can opt out of feeding the AI training pipeline while keeping every bit of your Google Search visibility. But there is a nuance worth understanding before you flip the switch, because Google-Extended sits close to features like AI Overviews, and the tradeoff is not always obvious from the name alone.

~1 in 3of the top 1,000 websites now disallow at least one AI crawler in robots.txt (Originality.ai, 2024)

Google-Extended is not Googlebot

Googlebot is the crawler that indexes your pages for Google Search. Google-Extended is a distinct user-agent token that governs whether Google may use already-accessible content to train its generative AI models, including Gemini and the models offered through Vertex AI. Adding a Disallow rule for Google-Extended does not remove you from the index, does not lower your ranking, and does not stop Googlebot from crawling. The two tokens are read and respected separately.

This design is deliberate. Google wanted publishers to have a clean way to say no to AI training without sacrificing search. So when you add a block for Google-Extended, you are sending a narrow, specific request: keep crawling and ranking me normally, but do not use my words to train your AI. That is exactly the kind of granular control robot.guard is built to make easy, since it keeps Googlebot on your whitelist while letting you disallow the training token.

The real tradeoff: training versus answer engines

Here is the part people miss. Google-Extended primarily governs training. The relationship between Google-Extended and features that summarize your content live in search results, like AI Overviews, has shifted over time and is not as clean a lever as a simple on or off. Blocking Google-Extended reliably opts you out of having your content used to train future models, which is the main thing most publishers care about.

So the decision comes down to your goals. If you are protecting original writing, research, or a content business and you do not want it absorbed into model training, blocking Google-Extended is a low-risk move because your search ranking is untouched. If your priority is maximum surface area inside Google's AI products and you are comfortable with your content being used that way, you might leave it open. Either way, you can preview the exact rule in robot.guard before you publish it, so there are no surprises.

How to block Google-Extended safely

Blocking Google-Extended is a two-line addition to robots.txt: name the user-agent, then disallow the whole site for that agent. Because robots.txt is parsed per user-agent, this rule applies only to the training token and leaves your Googlebot rules completely intact. There is no risk of accidentally deindexing yourself as long as you target the right token, which is the kind of mistake a live preview helps you avoid.

Remember that robots.txt is a request that compliant crawlers honour, not an enforced wall. Google is a major, identifiable operator that publicly states it respects these tokens, so for Google-Extended specifically you can trust the block. For crawlers that ignore robots.txt entirely, you would pair the file with a firewall, but that is a separate concern from opting out of Google's AI training.

frequently asked

Will blocking Google-Extended hurt my Google Search ranking?
No. Google-Extended and Googlebot are separate user-agents. Disallowing Google-Extended only opts you out of AI training; Googlebot keeps crawling, indexing, and ranking your site exactly as before.
What does Google-Extended actually control?
It controls whether Google can use your accessible content to train and improve its generative AI models, including Gemini and models offered through Vertex AI. It is an opt-out signal for AI training, not for search indexing.
Does blocking Google-Extended remove me from AI Overviews?
Google-Extended primarily governs model training, and its relationship to live search features like AI Overviews has changed over time. Treat it as a reliable training opt-out rather than a guaranteed switch for in-search AI summaries.
How do I add the Google-Extended block?
Add a robots.txt block that names the Google-Extended user-agent and disallows your whole site for it. In robot.guard you toggle it on, preview the exact file, and download it, while Googlebot stays whitelisted.

Last updated June 9, 2026

ready to try robot.guard?

start guarding your site