Should I allow or block AI crawlers in robots.txt?

For most businesses, allowing AI crawlers is beneficial because it enables your content to be cited in AI-generated responses. Block AI crawlers only if you have specific concerns about content licensing or data usage. A selective approach works best: allow crawlers access to public marketing content while blocking sensitive areas like admin panels and user data.

What are all the AI crawler user agents I need to configure?

The major AI crawler user agents to configure are GPTBot and OAI-SearchBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI), Applebot-Extended (Apple Intelligence), Bytespider (ByteDance), and CCBot (Common Crawl used by many AI models). Each requires a separate User-agent directive in your robots.txt file.

Does blocking GPTBot also block ChatGPT web search?

Yes, blocking GPTBot prevents your pages from appearing in ChatGPT web search results. OAI-SearchBot is a separate crawler used specifically for search, but many sites block both together. If you want ChatGPT to cite your content in real-time responses, you need to allow at least OAI-SearchBot access to your public pages.

How do I selectively allow AI crawlers on specific pages?

Use the Allow and Disallow directives in combination for each crawler. For example, allow GPTBot access to your blog and product pages with Allow directives while blocking admin and staging directories with Disallow. You can also use the Crawl-delay directive to limit how frequently AI bots access your site to reduce server load.

How quickly do AI crawlers respect robots.txt changes?

Most AI crawlers re-check robots.txt every 24 to 48 hours, though some may cache it for up to a week. After updating your robots.txt, allow at least one week before expecting full compliance. Note that previously crawled content may still exist in AI training data or caches even after you block a crawler.

robots.txt for AI Crawlers: Config Guide for 8 Bots [2026]

robots.txt for AI Crawlers: Complete Configuration Guide

Your robots.txt file controls which AI crawlers can access your content. Proper configuration is essential for AI visibility. This guide covers all major AI crawlers and provides copy-paste configurations. Also consider setting up an llms.txt file alongside your robots.txt.

Key Takeaways

8 major AI crawlers exist in 2026: GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Amazonbot, FacebookBot, and Applebot-Extended. Allow the ones that matter for your audience.
The recommended approach is selective access: allow public content (blog, docs, guides) while blocking sensitive areas (admin, API, dashboard).
Blocking GPTBot alone eliminates your visibility in ChatGPT — the world's most popular AI assistant. Do not block it unless you have a specific legal or competitive reason.
Crawl-delay is unreliable for AI bots. Not all crawlers respect it. Use server-side rate limiting instead if crawl volume is a concern.
Review your robots.txt quarterly as new AI crawlers emerge regularly.

Understanding AI Crawlers

AI platforms use specialized crawlers to index web content for their models. Unlike traditional search engine crawlers that focus on indexing for search results, AI crawlers gather information to improve AI responses.

Major AI Crawlers

Crawler	Platform	User-Agent
GPTBot	ChatGPT/OpenAI	GPTBot
ChatGPT-User	ChatGPT Browsing	ChatGPT-User
ClaudeBot	Claude/Anthropic	ClaudeBot
PerplexityBot	Perplexity	PerplexityBot
Google-Extended	Google AI/Gemini	Google-Extended
Amazonbot	Amazon Alexa	Amazonbot
FacebookBot	Meta AI	FacebookBot
Applebot-Extended	Apple AI	Applebot-Extended

Basic Configurations

Allow All AI Crawlers

# AI Crawlers - Allow All
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: FacebookBot
Allow: /

User-agent: Applebot-Extended
Allow: /

Block All AI Crawlers

# AI Crawlers - Block All
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

Selective Access (Recommended)

# AI Crawlers - Selective Access
# Allow public content, block sensitive areas

User-agent: GPTBot
Allow: /blog/
Allow: /docs/
Allow: /guides/
Allow: /glossary/
Disallow: /admin/
Disallow: /api/
Disallow: /dashboard/
Disallow: /account/

User-agent: ChatGPT-User
Allow: /
Disallow: /admin/
Disallow: /api/

User-agent: ClaudeBot
Allow: /blog/
Allow: /docs/
Allow: /guides/
Disallow: /admin/
Disallow: /api/

User-agent: PerplexityBot
Allow: /
Disallow: /admin/
Disallow: /api/

User-agent: Google-Extended
Allow: /
Disallow: /admin/
Disallow: /api/

Platform-Specific Considerations

OpenAI (GPTBot)

Crawl behavior:

Respects robots.txt
Focuses on content quality
Used for model training and ChatGPT browsing

Recommendation: Allow access to authoritative content you want cited.

Anthropic (ClaudeBot)

Crawl behavior:

Respects robots.txt
Less frequent crawling than GPTBot
Used for Claude's knowledge

Recommendation: Allow same access as GPTBot.

Perplexity (PerplexityBot)

Crawl behavior:

Very active crawler
Real-time search focus
Respects robots.txt

Recommendation: Allow broad access for search visibility.

Google (Google-Extended)

Crawl behavior:

Separate from Googlebot (search)
Used for Gemini/AI features
New in 2024

Recommendation: Allow if you want Gemini/AI Overview visibility.

Advanced Configurations

Rate Limiting

# Crawl delay for AI bots (in seconds)
User-agent: GPTBot
Allow: /
Crawl-delay: 10

User-agent: ClaudeBot
Allow: /
Crawl-delay: 10

Note: Not all crawlers respect Crawl-delay.

Sitemap Reference

# Include sitemap reference
Sitemap: https://yoursite.com/sitemap.xml

User-agent: GPTBot
Allow: /

Combined with Traditional Bots

# Traditional Search Engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# AI Crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

# General Rules
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://yoursite.com/sitemap.xml

Testing Your Configuration

1. Verify File Location

Your robots.txt must be at your domain root:

https://yoursite.com/robots.txt

2. Test Accessibility

Access the URL directly in a browser.

3. Validate Syntax

Use Google's robots.txt Tester or similar tools.

4. Monitor Crawl Logs

Check server logs for AI crawler activity.

Common Mistakes

1. Blocking All Bots Accidentally

# WRONG - This blocks everything including AI bots
User-agent: *
Disallow: /

2. Incorrect File Location

Place robots.txt at domain root, not in subdirectories.

3. Conflicting Rules

More specific rules should come after general rules.

4. Missing AI Crawlers

Update your robots.txt as new AI crawlers emerge.

5. Not Updating After Site Changes

Review robots.txt when restructuring your site.

Implementation Checklist

[ ] Identify which AI platforms matter for your business
[ ] Decide on allow/block strategy
[ ] Create or update robots.txt
[ ] Place at domain root
[ ] Validate syntax
[ ] Test accessibility
[ ] Monitor crawler activity
[ ] Review quarterly

Framework-Specific Implementation

Next.js

Create public/robots.txt or use dynamic generation.

WordPress

Use SEO plugins like Yoast or RankMath.

Static Sites

Add robots.txt to your build output directory.

Configure your robots.txt properly to maximize your AI visibility while protecting sensitive content.

Understanding AI Crawlers

Major AI Crawlers

Crawler	Platform	User-Agent
GPTBot	ChatGPT/OpenAI	GPTBot
ChatGPT-User	ChatGPT Browsing	ChatGPT-User
ClaudeBot	Claude/Anthropic	ClaudeBot
PerplexityBot	Perplexity	PerplexityBot
Google-Extended	Google AI/Gemini	Google-Extended
Amazonbot	Amazon Alexa	Amazonbot
FacebookBot	Meta AI	FacebookBot
Applebot-Extended	Apple AI	Applebot-Extended

Basic Configurations

Allow All AI Crawlers

# AI Crawlers - Allow All
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: FacebookBot
Allow: /

User-agent: Applebot-Extended
Allow: /

Block All AI Crawlers

# AI Crawlers - Block All
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

Selective Access (Recommended)

# AI Crawlers - Selective Access
# Allow public content, block sensitive areas

User-agent: GPTBot
Allow: /blog/
Allow: /docs/
Allow: /guides/
Allow: /glossary/
Disallow: /admin/
Disallow: /api/
Disallow: /dashboard/
Disallow: /account/

User-agent: ChatGPT-User
Allow: /
Disallow: /admin/
Disallow: /api/

User-agent: ClaudeBot
Allow: /blog/
Allow: /docs/
Allow: /guides/
Disallow: /admin/
Disallow: /api/

User-agent: PerplexityBot
Allow: /
Disallow: /admin/
Disallow: /api/

User-agent: Google-Extended
Allow: /
Disallow: /admin/
Disallow: /api/

Platform-Specific Considerations

OpenAI (GPTBot)

Crawl behavior:

Respects robots.txt
Focuses on content quality
Used for model training and ChatGPT browsing

Recommendation: Allow access to authoritative content you want cited.

Anthropic (ClaudeBot)

Crawl behavior:

Respects robots.txt
Less frequent crawling than GPTBot
Used for Claude's knowledge

Recommendation: Allow same access as GPTBot.

Perplexity (PerplexityBot)

Crawl behavior:

Very active crawler
Real-time search focus
Respects robots.txt

Recommendation: Allow broad access for search visibility.

Google (Google-Extended)

Crawl behavior:

Separate from Googlebot (search)
Used for Gemini/AI features
New in 2024

Recommendation: Allow if you want Gemini/AI Overview visibility.

Advanced Configurations

Rate Limiting

# Crawl delay for AI bots (in seconds)
User-agent: GPTBot
Allow: /
Crawl-delay: 10

User-agent: ClaudeBot
Allow: /
Crawl-delay: 10

Note: Not all crawlers respect Crawl-delay.

Sitemap Reference

# Include sitemap reference
Sitemap: https://yoursite.com/sitemap.xml

User-agent: GPTBot
Allow: /

Combined with Traditional Bots

# Traditional Search Engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# AI Crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

# General Rules
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://yoursite.com/sitemap.xml

Testing Your Configuration

1. Verify File Location

Your robots.txt must be at your domain root:

https://yoursite.com/robots.txt

2. Test Accessibility

Access the URL directly in a browser.

3. Validate Syntax

Use Google's robots.txt Tester or similar tools.

4. Monitor Crawl Logs

Check server logs for AI crawler activity.

Common Mistakes

1. Blocking All Bots Accidentally

# WRONG - This blocks everything including AI bots
User-agent: *
Disallow: /

2. Incorrect File Location

Place robots.txt at domain root, not in subdirectories.

3. Conflicting Rules

More specific rules should come after general rules.

4. Missing AI Crawlers

Update your robots.txt as new AI crawlers emerge.

5. Not Updating After Site Changes

Review robots.txt when restructuring your site.

Implementation Checklist

[ ] Identify which AI platforms matter for your business
[ ] Decide on allow/block strategy
[ ] Create or update robots.txt
[ ] Place at domain root
[ ] Validate syntax
[ ] Test accessibility
[ ] Monitor crawler activity
[ ] Review quarterly

Framework-Specific Implementation

Next.js

Create public/robots.txt or use dynamic generation.

WordPress

Use SEO plugins like Yoast or RankMath.

Static Sites

Add robots.txt to your build output directory.

Configure your robots.txt properly to maximize your AI visibility while protecting sensitive content.

robots.txt for AI Crawlers: Complete Configuration Guide

Key Takeaways

Curious how your site scores?

Understanding AI Crawlers

Major AI Crawlers

Basic Configurations

Allow All AI Crawlers

Block All AI Crawlers

Selective Access (Recommended)

Platform-Specific Considerations

OpenAI (GPTBot)

Crawl behavior:

Anthropic (ClaudeBot)

Crawl behavior:

Perplexity (PerplexityBot)

Crawl behavior:

Google (Google-Extended)

Crawl behavior:

Advanced Configurations

Rate Limiting

Sitemap Reference

Combined with Traditional Bots

Testing Your Configuration

1. Verify File Location

2. Test Accessibility

3. Validate Syntax

4. Monitor Crawl Logs

Common Mistakes

1. Blocking All Bots Accidentally

2. Incorrect File Location

3. Conflicting Rules

4. Missing AI Crawlers

5. Not Updating After Site Changes

Implementation Checklist

Framework-Specific Implementation

Next.js

WordPress

Static Sites

See how your site performs in AI search.

Related Resources

AI Crawler

Reddit Citation Guide: Win Mentions with Crawler Access

Claude

Perplexity AI

Perplexity SEO: How to Get Cited in Perplexity Search [2026]

Your content is invisible to AI — here's why

robots.txt for AI Crawlers: Complete Configuration Guide

Key Takeaways

Curious how your site scores?

Understanding AI Crawlers

Major AI Crawlers

Basic Configurations

Allow All AI Crawlers

Block All AI Crawlers

Selective Access (Recommended)

Platform-Specific Considerations

OpenAI (GPTBot)

Crawl behavior:

Anthropic (ClaudeBot)

Crawl behavior:

Perplexity (PerplexityBot)

Crawl behavior:

Google (Google-Extended)

Crawl behavior:

Advanced Configurations

Rate Limiting

Sitemap Reference

Combined with Traditional Bots

Testing Your Configuration

1. Verify File Location

2. Test Accessibility

3. Validate Syntax

4. Monitor Crawl Logs

Common Mistakes

1. Blocking All Bots Accidentally

2. Incorrect File Location

3. Conflicting Rules

4. Missing AI Crawlers

5. Not Updating After Site Changes

Implementation Checklist