Google · training bot · Last updated 2026-05-22
How to allow or block Google-Extended (Gemini training + Google AI corpora) in your robots.txt. Why Googlebot allow does NOT automatically allow Google-Extended.
Google-Extended is a separate user-agent from Googlebot. Googlebot is for Google Search. Google-Extended is for Gemini training, the Google AI corpora used by Bard's successor, and Google's other AI surfaces. Critically, allowing Googlebot does NOT automatically allow Google-Extended — they require separate User-agent blocks. Google introduced this split in 2023 specifically to let publishers opt out of AI training while keeping Search visibility.
If you allow Googlebot (most sites do) but don't explicitly allow Google-Extended, you're invisible to Gemini training. Gemini is Google's flagship AI assistant — invisible there means missing one of the highest-reach AI surfaces, especially for B2B and enterprise queries where Gemini's Workspace integration drives discovery. This is the single most common AI-visibility gap BrandCited finds in audits.
BrandCited recommendation
Most brands should allow Google-Extended. The exception is publishers with paid premium content who deliberately want to opt out of AI training while keeping Search visibility — Google-Extended is the only major bot designed specifically for this opt-out pattern.
The exact directive to add to your robots.txt for Google-Extended. Paste at the end of your file — bot-specific blocks override the wildcard above.
robots.txt
Copy and paste# Allow Google-Extended for Gemini training
User-agent: Google-Extended
Allow: /
Disallow: /admin
Disallow: /api/
Disallow: /dashboard
# Note: Googlebot allow does NOT inherit to Google-Extended.
# Both blocks are required for full Google + Gemini visibility.
User-agent: Googlebot
Allow: /
Disallow: /admin
Disallow: /api/
Disallow: /dashboardNo. They are independent user-agents with independent rules. Google introduced this separation in 2023 specifically so publishers can opt out of AI training while keeping Search. You must include a dedicated User-agent: Google-Extended block to be in Gemini training.
Gemini is the consumer-facing AI assistant; Google-Extended is the crawler that feeds Gemini's training data and Google's broader AI corpora. Blocking Google-Extended means new Gemini model versions will not include your content. Currently-trained Gemini knowledge persists until the next model retrain.
For paywalled premium content where you don't want AI-summarised excerpts undermining subscription value, yes. Block Google-Extended specifically (Disallow: /) for those paths while keeping Googlebot allowed. This is the exact opt-out pattern Google designed Google-Extended to support.
https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers is the canonical reference. It lists all Google crawlers including Google-Extended, with the specific note about robots.txt rules being independent of Googlebot.
Yes. GoogleOther for one-off research crawls. Google-NotebookLM for NotebookLM source ingestion. These are typically more limited and less critical than Google-Extended, but worth including in a complete robots.txt allowlist if you want maximum coverage.
Cite this guide
BrandCited. (2026). Google-Extended robots.txt — How to Allow, Block, or Audit. https://www.brandcited.ai/tools/robots-txt-auditor/google-extended
Each major AI engine operates one or more user-agents. Configure them in parallel for complete coverage.
robots.txt is one of dozens of AI ranking factors BrandCited audits. Run a free scan to also check schema completeness, llms.txt configuration, content structure, entity recognition, and AI citation share-of-voice.