Free tool · No signup · Last updated 2026-05-22
Paste a URL or your robots.txt content. The auditor checks every reputable AI training and search bot — OpenAI's GPTBot / OAI-SearchBot / ChatGPT-User, Anthropic's ClaudeBot / Claude-SearchBot / Claude-User, Google-Extended, PerplexityBot, Apple Intelligence (Applebot-Extended), Mistral, Meta, ByteDance, Yandex, plus the long tail — and reports which bots can crawl your site, which are blocked, and which you forgot to list. 64 bots, one click, no signup.
robots.txt is a plain-text file at the root of your domain (yoursite.com/robots.txt) that tells crawlers which paths they can and cannot fetch. It was designed for search engine bots in the 1990s and has worked the same way ever since: a series of User-agent blocks, each followed by Allow and Disallow directives. Crawlers read the file, find their own block, and obey what it says.
The trouble: AI bots have proliferated. As of 2026 there are roughly 40 reputable AI training and search user-agents across 13+ owners — and every quarter another lands. Default robots.txt files written for Googlebot in 2018 routinely block (or fail to acknowledge) GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, MistralAI-User, Bytespider, and dozens of others. The result: a site can rank fine in Google Search but be entirely invisible to Gemini, ChatGPT, Claude, Apple Intelligence, or any other AI engine because the bot saw a Disallow rule meant for general crawlers.
This auditor checks 64bots in one pass against either a live URL or pasted content. The score reflects coverage; the per-bot table shows exactly which engine you've been quietly blocking; the suggested-fix block is copy-paste ready.
Google's robots.txt tester (now inside Search Console) checks Googlebot specifically. Third-party robots.txt parsers like technicalseo.com or merkle-inc.com's checker focus on classical SEO crawlers. Five reasons a dedicated AI-bot pass matters:
Five steps. Total time ~2 minutes including reading the report.
URL mode fetches /robots.txt from the live domain. Paste mode lets you audit a local file before deploying — useful in CI/CD or before a migration.
Pure client-side; no data leaves your browser except the URL fetch (CORS-permitting). The tool parses your robots.txt and checks each of the 64 bots.
Each row shows: bot name, owner, status (Allow / Blocked / Not mentioned), and purpose. Filter by status mentally; blocked rows are highlighted in rose, not-mentioned in amber, explicit allows in emerald.
When any bots are blocked, the tool emits a copy-paste fix block. Paste at the end of your robots.txt — order matters: bot-specific blocks override the wildcard.
Save the updated robots.txt, deploy, then re-run against the live URL. Most AI engines re-check robots.txt within 24-72 hours of the next crawl cycle.
64 bots across 35 owners. Maintained in src/lib/seo/ai-bots.ts and updated as new engines launch.
| Bot | Owner | Purpose |
|---|---|---|
| Googlebot | Google Search primary crawler | |
| Googlebot-Image | Google Images | |
| Googlebot-News | Google News | |
| Googlebot-Video | Google Video / YouTube discovery | |
| Google-Extended | Gemini training + Google AI corpora (distinct from Search) | |
| GoogleOther | Google internal research / one-off crawls | |
| Google-NotebookLM | NotebookLM source ingestion | |
| Google-Agent | Browser-using AI agents (Project Mariner) — distinct from Googlebot and Google-Extended (announced May 2026) | |
| Google-Pinpoint | Journalism research tool / Pinpoint document ingestion | |
| Google-CWS | Chrome Web Store extension verification | |
| AdsBot-Google | Google Ads landing-page quality crawl | |
| Storebot-Google | Google Shopping product feed validation | |
| Bingbot | Microsoft | Bing Search + Copilot ground truth |
| BingPreview | Microsoft | Bing snapshot previews |
| msnbot | Microsoft | Legacy MSN crawler (still respected) |
| GPTBot | OpenAI | Training data for GPT models |
| OAI-SearchBot | OpenAI | SearchGPT real-time indexing |
| ChatGPT-User | OpenAI | Real-time fetch on behalf of a ChatGPT user |
| ClaudeBot | Anthropic | Training data crawler |
| Claude-SearchBot | Anthropic | Real-time search indexing |
| Claude-User | Anthropic | Real-time fetch on behalf of a Claude user |
| anthropic-ai | Anthropic | Legacy UA (pre-rebrand) — still in some training pipelines |
| PerplexityBot | Perplexity | Indexing crawler |
| Perplexity-User | Perplexity | Real-time fetch during user queries |
| Applebot | Apple | Siri + Spotlight search |
| Applebot-Extended | Apple | Apple Intelligence training (distinct from Applebot) |
| DuckDuckBot | DuckDuckGo | DuckDuckGo search index |
| DuckAssistBot | DuckDuckGo | DuckAssist AI answers |
| Meta-ExternalAgent | Meta | Llama training + Meta AI retrieval |
| Meta-ExternalFetcher | Meta | On-demand fetches for Meta AI |
| FacebookBot | Meta | Open Graph + sharing previews |
| YandexBot | Yandex | Yandex Search + Alice + YandexGPT |
| YandexImages | Yandex | Yandex image index |
| Bytespider | ByteDance | Doubao / Toutiao / TikTok AI training |
| MistralAI-User | Mistral | Real-time retrieval for Le Chat |
| cohere-ai | Cohere | Cohere Command training |
| cohere-training-data-crawler | Cohere | Newer Cohere training UA |
| YouBot | You.com | You.com search + AI modes |
| Amazonbot | Amazon | Alexa, Amazon Q, Rufus |
| Brave-Search-Bot | Brave | Brave Search + Leo AI |
| Kagibot | Kagi | Kagi premium search + assistant |
| Diffbot | Diffbot | Entity / Knowledge Graph extraction (feeds many AI products) |
| ImagesiftBot | TheHive | Image entity extraction |
| CCBot | Common Crawl | Open dataset that feeds most open-source LLMs |
| Baiduspider | Baidu | Baidu search + ERNIE AI training (China) |
| Baidu-YJK | Baidu | Baidu academic / specialized index |
| NaverBot | Naver | Naver search (Korea) |
| Yeti | Naver | Naver image and content crawler (Korea) |
| Sogou web spider | Sogou | Sogou search + AI (China) |
| Sogou-AI | Sogou | Sogou AI training corpus |
| Qwantify | Qwant | Qwant search (EU privacy-focused) |
| YandexAdditional | Yandex | Yandex secondary crawler / YandexGPT training |
| PetalBot | Huawei | Huawei Petal Search + Petal AI |
| YisouSpider | Yisou | Yisou search (China) — used by Quark AI |
| PhindBot | Phind | Phind developer-focused AI search |
| KomoBot | Komo | Komo AI search assistant |
| VectaraCrawler | Vectara | Vectara RAG infrastructure crawler |
| Andibot | Andi | Andi conversational search |
| NeevaBot | Snowflake (ex-Neeva) | Snowflake / former Neeva search corpus |
| FriendlyCrawler | Friendly | Friendly AI content crawler |
| AwarioRssBot | Awario | Awario brand-mention monitoring (feeds many GEO products) |
| iaskspider | iAsk | iAsk.ai conversational answer engine |
| AhrefsBot | Ahrefs | Ahrefs index (used by Ahrefs Brand Radar GEO product) |
| SemrushBot | Semrush | Semrush index (used by Semrush AI Visibility) |
How this auditor stacks up against other robots.txt checkers as of 2026.
| Tool | Bots checked | AI bots covered | Fix block output | Cost |
|---|---|---|---|---|
| BrandCited (this tool) | 64 | Free | ||
| Google Search Console robots.txt tester | 1 (Googlebot) | Free (requires Google Search Console) | ||
| TechnicalSEO.com robots.txt tester | ~10 | Partial | Free | |
| Merkle robots.txt checker | ~6 | Partial | Free | |
| DarkVisitors AI crawler list | 100+ (catalog only, no audit) | Free + paid blocking service |
These show up in 50-70% of robots.txt files BrandCited audits.
Agencies, consultants, and developer-tooling sites are welcome to embed the auditor. No fee, no signup, no usage limits. The iframe stays branded with a small "Powered by BrandCited" badge.
Embed code
Copy and paste into your HTML<iframe
src="https://www.brandcited.ai/tools/robots-txt-auditor?embed=1"
width="100%"
height="900"
frameborder="0"
loading="lazy"
title="robots.txt Checker for AI Crawlers by BrandCited"
></iframe>Whether each of 64 AI search and training bots can crawl your site. For every bot we check: is there a User-agent block for it specifically? Does it inherit from the wildcard (*)? Is the root path (/) allowed or disallowed? The output is a per-bot table plus a 0-100 score weighted by how widely used the engine is.
Default robots.txt files written for Google often accidentally block specific AI bots that have separate user-agents (Google-Extended, GPTBot, ClaudeBot, PerplexityBot, Applebot-Extended). A site can rank fine in Google Search but be entirely invisible to Gemini, ChatGPT, Claude, or Apple Intelligence — because those AI bots saw a Disallow rule meant for general crawlers. The auditor catches this in seconds.
No. Allowing Googlebot does NOT automatically allow Google-Extended. Google-Extended is the bot Gemini and Google AI use for training and retrieval — it is a separate user-agent with its own robots.txt rules. The same applies to OpenAI: GPTBot for training, OAI-SearchBot for SearchGPT indexing, ChatGPT-User for real-time fetches. You need explicit Allow blocks per bot.
GPTBot is OpenAI's crawler for collecting training data for ChatGPT. Allowing it means your content can be included in future model training, which contributes to brand mentions when users ask ChatGPT questions. Blocking GPTBot specifically (while still allowing OAI-SearchBot and ChatGPT-User) preserves ChatGPT browsing-mode visibility while opting out of training. Most brands should allow GPTBot — AI training is the foundation of long-term AI visibility.
Anthropic uses three separate user-agents. ClaudeBot crawls for training data. Claude-SearchBot indexes for real-time Claude search. Claude-User performs real-time fetches when a Claude user asks about a specific URL during conversation. Blocking ClaudeBot stops training but Claude-User can still fetch your page on demand. Blocking Claude-User means Claude users get "I couldn't fetch that URL" responses. Most brands should allow all three.
Apple uses two crawler user-agents. Applebot crawls for Spotlight search and basic Siri results — most sites have always allowed this. Applebot-Extended is the newer, separate user-agent for Apple Intelligence training (the on-device LLM + Apple cloud AI). Many robots.txt files written before 2024 don't include it. Blocking Applebot-Extended silently removes your brand from Apple Intelligence training and reduces visibility in Siri, the Apple-Intelligence-powered Search bar, and the on-device assistant.
Google's robots.txt tester (now part of Search Console) checks whether Googlebot specifically can crawl a URL. It doesn't audit AI bots. This tool checks 42 AI bots in one pass — Googlebot, GPTBot, ClaudeBot, PerplexityBot, Bingbot, Applebot-Extended, plus 36 others. Use both: Google's tester for Search-specific debugging, this auditor for AI-visibility coverage.
The score is (allowed bots / total bots) × 100, where "not mentioned" counts half (the bot inherits the wildcard, which usually allows, but explicit allow is the canonical signal). A score of 90+ means your robots.txt explicitly allows essentially every reputable AI bot. Below 50 means structural gaps that AI engines will hit when trying to reach your content.
No. Allowing the bot is necessary but not sufficient. The bot still needs to find your URLs (via sitemap), the content has to be valuable, the schema has to be complete, and the engine has to choose to cite you. But blocking a bot is a guaranteed way to be invisible to that engine — the robots.txt allowlist is the floor, not the ceiling.
It explicitly allows every reputable AI bot in dedicated User-agent blocks (not just the wildcard), disallows only gated paths (/admin, /api, /dashboard, /onboarding), references the sitemap at the bottom, and includes a Host directive for the canonical hostname. Static robots.txt files copied from 2018 SEO blogs almost always miss the new AI bots — Google-Extended, Applebot-Extended, MistralAI-User, etc. Run this auditor and use the fix-block output as a template.
Yes. Use the embed code below to add the auditor as an iframe on any page of your own site. SEO agencies and developer tooling sites embed it so clients can self-serve. There is no fee and no attribution requirement; the embedded version links back to BrandCited via the "Powered by" badge.
Quarterly minimum. The AI bot list changes — every few months a new engine (Mistral, Cohere, MistralAI-User, etc.) gets its own user-agent and old robots.txt files silently miss it. Also re-audit after any site migration, CDN change, or framework switch (Next.js, WordPress, Webflow all serve robots.txt differently).
Cite this tool
BrandCited robots.txt Checker for AI Crawlers. (2026). https://www.brandcited.ai/tools/robots-txt-auditor
robots.txt is one of 91 AI ranking factors BrandCited audits across 8 categories. Run a free scan to also check schema completeness, llms.txt configuration, content structure, entity recognition, and AI citation share-of-voice.
Run a free AI visibility scanTry the schema generator →Read the robots.txt for AI guide →