Google to Stop Supporting Robots.txt Noindex: What That Means for You


Effective September 1, 2019, Google will no longer support the robots.txt directive related to indexing. This means that Google will start to index your webpage(s) if you’ve only relied on the robots.txt noindex directive to remove those pages from the SERPs. You have until the first of September to remove it and use another method.

What is a noindex robots.txt? It’s a tag (usually in the HTML) in your robots.txt file that prevents search engines from including that page in the SERPs.

Why is Google no longer supporting it? Because the noindex robots.txt directive is not an official directive. And, as Google says:

“In the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, we’re retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019.”

We’ve helped Fortune 500 companies, venture backed startups and companies like yours grow revenues faster. Get A Free Consultation

Recent Google Updates

Google has been busy doing a lot of updates in 2019. As a refresher, the most prominent ones are:

  • June 2019 core update. Google released an official statement saying that Tomorrow, we are releasing a broad core algorithm update, as we do several times per year. It is called the June 2019 Core Update. Our guidance about such updates remains as we’ve covered before.”
  • Diversity update. This smaller June update impacts transactional searches the most. As per the update, Google now aims to return results from unique domains and will no longer display more than two results from the same domain.
  • March 2019 core update. This is another broad change to its algorithm. Google confirmed this update, but did not provide a name, so it’s been referred to as either the Florida 2 update or the Google 3/12 broad core update. There was no new guidance given for this update.

Related Content:

Goodbye to Google’s Robots.txt Noindex Directive

Now, in July 2019, Google has bid adieu to undocumented and unsupported rules in robots.txt. This is what Google tweeted on July 2nd, 2019:

If your website uses the noindex directive in the robots.txt file, then you’ll need to use other options. As per the statement published on the official Google Webmaster Central Blog:

“In the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, we’re retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019.”

The reason for canceling the support for robots.txt noindex was also discussed in the Google blog:

“In particular, we focused on rules unsupported by the internet draft, such as crawl-delay, nofollow, and noindex. Since these rules were never documented by Google, naturally, their usage in relation to Googlebot is very low. Digging further, we saw their usage was contradicted by other rules in all but 0.001% of all robots.txt files on the internet. These mistakes hurt websites’ presence in Google’s search results in ways we don’t think webmasters intended.”

Robots.txt – The Robots Exclusion Protocol (REP)

The Robots Exclusion Protocol (REP), better known as the Robots.txt, has been in use since 1994 but was never turned into an official Internet standard. But without a proper standard, both webmasters and crawlers were confused regarding what gets crawled. Additionally, the REP was never updated to cover today’s corner cases.

As per the official Google blog:

REP was never turned into an official Internet standard, which means that developers have interpreted the protocol somewhat differently over the years. And since its inception, the REP hasn’t been updated to cover today’s corner cases. This is a challenging problem for website owners because the ambiguous de-facto standard made it difficult to write the rules correctly.”

To end this confusion, Google has documented how the REP is used on the web, and submitted it to the IETF (Internet Engineering Task Force), which is an Open Standards Organization to make the Internet work better.

Google said in an official statement:

“We wanted to help website owners and developers create amazing experiences on the internet instead of worrying about how to control crawlers. Together with the original author of the protocol, webmasters, and other search engines, we’ve documented how the REP is used on the modern web, and submitted it to the IETF.”

What This Means for You

If you use noindex in your robots.txt file, Google will no longer honor it. They have been honoring some of those implementations, even though John Mueller reminds us:

You’ll see a notification in Google Search Console if you continue using noindex in your robots.txt files.

Related Content:

Alternatives to Using the Robots.txt Indexing Directive

If your website still relies on the robots.txt noindex directive then that needs to be changed because Googlebots won’t follow the directive rules starting September 1, 2019. But what should you use instead? Here are some alternatives:

1) Block Search Indexing with ‘noindex’ Meta Tag

To prevent the search engine crawlers from indexing a page, you can use the ‘noindex’ meta tag and add it in the <head> section of your page.

<meta name=”robots” content=”noindex”>

Alternatively, you can use the HTTP response headers with an X-Robots-Tag instructing crawlers not to index a page:

HTTP/1.1 200 OK

(…)

X-Robots-Tag: noindex

2) Use 404 and 410 HTTP Status Codes

410 is the status code that is returned when the target resource is no longer available at the origin server.

As HTTPstatuses points out:

The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed.”

404 is similar to 410 status code. In the words of John Mueller:

From our point of view, in the mid term/long term, a 404 is the same as a 410 for us. So in both of these cases, we drop those URLs from our index.

We generally reduce crawling a little bit of those URLs so that we don’t spend too much time crawling things that we know don’t exist.

The subtle difference here is that a 410 will sometimes fall out a little bit faster than a 404. But usually, we’re talking on the order of a couple days or so.

So if you’re just removing content naturally, then that’s perfectly fine to use either one. If you’ve already removed this content long ago, then it’s already not indexed so it doesn’t matter for us if you use a 404 or 410.”

3) Use Password Protection

You can hide a page behind logins because Google does not index pages that are hidden behind paywalled content or logins.

4) Disallow Robots Using Robots.txt

You can use the disallow directive in the robots.txt file to direct the search engines to disallow indexing of your chosen pages, which simply means that you’re telling search engines not to crawl a specific page.

In the words of Google:

“While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.”

5) Use the Search Console Remove URL Tool

You can use the Search Console Remove URL Tool to remove a URL temporarily from the search results. This block will last for 90 days. If you wish to make the block permanent, then you can use any one of the four methods suggested above.

We’ve helped Fortune 500 companies, venture backed startups and companies like yours grow revenues faster. Get A Free Consultation

Last Word

If you want to learn more about how to get your content removed from the Google search results, then head over to the Google Help Center.





Source link

WP Twitter Auto Publish Powered By : XYZScripts.com
Exit mobile version