OpenAI · training bot · Last updated 2026-05-22
How to allow or block GPTBot (OpenAI ChatGPT training crawler) in your robots.txt. Why most brands should allow it, when blocking makes sense, and the exact directive to use.
GPTBot is OpenAI's crawler for collecting training data for ChatGPT and future GPT models. It is one of three OpenAI user-agents (alongside OAI-SearchBot for SearchGPT indexing and ChatGPT-User for real-time fetches during conversations). GPTBot follows standard robots.txt rules and respects User-agent: GPTBot blocks. When allowed, it crawls accessible pages on a paced schedule and the content becomes part of the corpus for the next model training cycle.
GPTBot is the single most consequential AI crawler for long-term ChatGPT visibility. Pages it can read become candidates for citation when users ask ChatGPT questions on related topics months or years later. Blocking GPTBot today doesn't immediately remove your brand from ChatGPT (current training data persists), but it caps your future growth in the engine. Most brands should allow GPTBot — it's the deposit you make to be cited tomorrow.
BrandCited recommendation
Allow for marketing sites, blogs, documentation, and any public content you want cited by ChatGPT. Block specifically only if you have a legal or licensing reason (paywalled premium content, exclusive partnerships). Use ChatGPT-User and OAI-SearchBot allow rules to preserve browse-mode visibility even when GPTBot is blocked.
The exact directive to add to your robots.txt for GPTBot. Paste at the end of your file — bot-specific blocks override the wildcard above.
robots.txt
Copy and paste# Allow GPTBot full access
User-agent: GPTBot
Allow: /
Disallow: /admin
Disallow: /api/
Disallow: /dashboard
Disallow: /onboarding
# To OPT OUT of ChatGPT training while keeping real-time fetches working:
# User-agent: GPTBot
# Disallow: /
#
# User-agent: ChatGPT-User
# Allow: /
#
# User-agent: OAI-SearchBot
# Allow: /No, not immediately. Current ChatGPT model knowledge comes from training data already collected — blocking GPTBot today caps future training inclusion, not present citations. For real-time citations in ChatGPT browse mode, the relevant bot is ChatGPT-User (separate user-agent). Allowing GPTBot is about feeding the next model training cycle.
GPTBot identifies itself with the User-Agent string starting with "GPTBot". Grep your access logs for "GPTBot/" — most CDNs and server logs surface this directly. OpenAI publishes the source IP ranges at https://platform.openai.com/docs/bots for verification.
Yes, but be careful. A Crawl-delay of 30 seconds limits GPTBot to one request per 30 seconds — at that pace, a 5,000-page site takes ~42 hours per crawl. Old SEO robots.txt templates often have this set; for AI training pipelines it slows you down with no upside. Either remove Crawl-delay entirely or set it to 1-2 seconds.
Yes, exactly. The standard pattern: Allow: / globally for GPTBot, then Disallow specific paths like /admin, /api, /dashboard, /onboarding. This gives the training pipeline access to public marketing pages while keeping authenticated app surfaces private.
Three distinct bots from OpenAI. GPTBot collects training data (slow, batched). OAI-SearchBot indexes pages for SearchGPT real-time search (faster, fresher). ChatGPT-User fetches a specific URL when a ChatGPT user references it in a conversation (on-demand, per query). Block one and you affect only that surface.
Cite this guide
BrandCited. (2026). GPTBot robots.txt — How to Allow, Block, or Audit. https://www.brandcited.ai/tools/robots-txt-auditor/gptbot
Each major AI engine operates one or more user-agents. Configure them in parallel for complete coverage.
robots.txt is one of dozens of AI ranking factors BrandCited audits. Run a free scan to also check schema completeness, llms.txt configuration, content structure, entity recognition, and AI citation share-of-voice.