The ultimate guide to robots.txt • Yoast

Joost de Valk

Joost de Valk is the founder and Chief Product Officer of Yoast and the Lead Marketing & Communication for WordPress.org. He’s a digital marketer, developer and an Open Source fanatic.

Avatar of Joost de Valk

The robots.txt file is one of the main ways of telling a search engine where it can and can’t go on your website. All major search engines support the basic functionality it offers, but some of them respond to some extra rules which can be useful too. This guide covers all the ways to use robots.txt on your website, but, while it looks simple, any mistakes you make in your robots.txt can seriously harm your site, so make sure you read and understand the whole of this article before you dive in.

Want to learn all about technical SEO? Our Technical SEO bundle is on sale today: you’ll get a $40 discount if you get it now. This bundle combines our Technical SEO training and Structured data training. After completing this course, you’ll be able to detect and fix technical errors; optimize site speed and implement structured data. Don’t wait!

What is a `robots.txt` file?

Disallow: /*?*

User-agent: * 
Disallow: / User-agent: Googlebot 
Disallow: 
User-agent: bingbot 
Disallow: /not-for-bing/

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Search engine	Field	User-agent
Baidu	General	`baiduspider`
Baidu	Images	`baiduspider-image`
Baidu	Mobile	`baiduspider-mobile`
Baidu	News	`baiduspider-news`
Baidu	Video	`baiduspider-video`
Bing	General	`bingbot`
Bing	General	`msnbot`
Bing	Images & Video	`msnbot-media`
Bing	Ads	`adidxbot`
Google	General	`Googlebot`
Google	Images	`Googlebot-Image`
Google	Mobile	`Googlebot-Mobile`
Google	News	`Googlebot-News`
Google	Video	`Googlebot-Video`
Google	AdSense	`Mediapartners-Google`
Google	AdWords	`AdsBot-Google`
Yahoo!	General	`slurp`
Yandex	General	`yandex`

User-agent: * 
Disallow: /

User-agent: * 
Disallow:

User-agent: googlebot 
Disallow: /Photo

Disallow: /*.php 
Disallow: /copyrighted-images/*.jpg

Disallow: /*.php$

Disallow: /wp-admin/ 
Allow: /wp-admin/admin-ajax.php

host: example.com

crawl-delay: 10

What is a robots.txt file?

Crawl directives

What does the robots.txt file do?

humans.txt

Where should I put my robots.txt file?

Pros and cons of using robots.txt

Pro: managing crawl budget

A note on blocking query parameters

Con: not removing a page from search results

Noindex directives

Con: not spreading link value

robots.txt syntax

WordPress robots.txt

The User-agent directive

The most common user agents for search engine spiders

The Disallow directive

How to use wildcards/regular expressions

Non-standard robots.txt crawl directives

The Allow directive

The host directive

The crawl-delay directive

The sitemap directive for XML Sitemaps

Validate your robots.txt

What is a `robots.txt` file?