Robots.txt Template

robots.txt Template

robots.txt file is a text file used to instruct web crawlers (like Googlebot) which pages or sections of your website they are allowed to access.
The robots.txt file should be placed in the root directory of your website (e.g., www.example.com/robots.txt)

Allow All Crawlers to Access Everything

User-agent: *
Disallow:

Specify Sitemap (Optional)

The sitemap URL can be provided to assist crawlers

Sitemap: https://www.example.com/sitemap.xml

Blocking a Specific File or Directory

File and Directories can be requested not be crawled and indexed

Disallow: /admin/
Disallow: /private-page.html

Only Blocking a certain bot

Certain bots can be instructed not to crawl the site but all allow other bots

User-agent: YandexBot
Disallow: /

User-agent: *
Disallow:

Crawl Delay

Instruct bots to wait 10 seconds between requests reduce server load

Crawl-delay: 10

Example robots.txt

Example of a complex robots.txt file

# Block all bots from accessing the /admin/ directory
User-agent: *
Disallow: /admin/

# Block Twitter Bot from site
User-agent: Twitterbot
Disallow: /

# Allow all bots to access the entire site except for specific file types
User-agent: *
Disallow: /*.pdf$
Disallow: /*.zip$
Disallow: /*.mp4$

# Block all bots from accessing specific files
User-agent: *
Disallow: /config.php
Disallow: /documents/company_profile.jpg

# Location of the sitemap for SEO
Sitemap: https://www.example.com/sitemap.xml

# Crawl delay for all bots (helpful for reducing server load)
User-agent: *
Crawl-delay: 10

Common Bots

Common bots and user-agents

Search Engine Bots
Google: Googlebot
Bing: bingbot
Yahoo: Slurp
Baidu: Baiduspider
Yandex: YandexBot
DuckDuckGo: DuckDuckBot
Sogou: Sogou web spider

Social Media Bots
Facebook: facebookexternalhit
Twitter: Twitterbot
LinkedIn: LinkedInBot

Monitoring and SEO Bots
AhrefsBot: AhrefsBot
SEMrushBot: SEMrushBot
MozBot: rogerbot

Archive and Data Collection Bots
Wayback Machine: ia_archiver
Common Crawl: CCBot

Other Bots
Apple: Applebot