Unreachable Robots.txt – Google Couldn’t Crawl Your Site Because We Were Unable To Access The Robots.txt File

Recently Google Webmaster Tools tells me that it is unable to crawl my site because my robots.txt file is unreachable. Having a robots.txt file registered in Google Webmaster Tools that cannot be accessed is worse than not having a robots.txt file at all.

A robots.txt file tells search engines whether they are allowed to crawl your website or not. Notice that I say tells search engines, rather than enables or disallows? This is because it’s more of a suggestion than a technical restriction or mandate, although search engines do usually adhere to this. Robots.txt files also tell search engines the location of sitemaps.

Having a robots.txt file that Google knows about but can’t access will make Google unable to crawl your website because there are some unreachable instructions about the website; if Google didn’t know about the robots.txt file, then it would just crawl the site as normal, assuming that the webmaster has no special privacy requirements about accessing their site.

Enabling Google To Access The Robots.txt file and Crawl My Site

As mentioned before, I was getting the following error:

Google Couldn’t Crawl Your Site Because We Were Unable To Access The Robots.txt File

Other area’s of Google Webmaster Tools were stating this error:

Robots.txt unreachable

  1. Obviously the first thing I did was to see if robots.txt existed. Since I use WordPress, I couldn’t access the robots.txt file from the root of my website’s home directory because it’s dynamically created in the PHP, so I simply went to the robots.txt URL to see if it was available.
  2. The robots.txt file wasn’t unreachable for me and it was displaying the correct content, so I checked that the .htaccess file that was sitting in the root of my site’s home directory was correct, which it was.
  3. As WordPress was dynamically generating the robots.txt file, I couldn’t check the file permissions on it, but this would have been my next step. Instead, the next thing I did was contact my hosting provider, who looked into it and realized that they were actually blocking googlebot’s IP.

This was fixed, which I confirmed by using Google Webmaster Tool’s Fetch as Google feature to access the robots.txt file. I then resubmitted my sitemap to Google and my website was available on Google two days later.

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
Unreachable Robots.txt - Google Couldn't Crawl Your Site Because We Were Unable To Access The Robots.txt File, 10.0 out of 10 based on 1 rating

4 thoughts on “Unreachable Robots.txt – Google Couldn’t Crawl Your Site Because We Were Unable To Access The Robots.txt File

  1. A few years I had this issue one one of my sites, in my case it was a WordPress Plug In… I wish I could remember which Plugin it was.. all I can remember now it was a Plug in related to spam.

    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
    • One thing I was concerned about is a plugin I’m using on the top level domain (which hosts another WP site). This plugin redirects all unregistered users to the login page, including GoogleBot’s attempts to read the robots.txt page!

      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)
      • Oh that would be frustrating.

        I loved that WP spam Plug in I used, I still can’t remember it’s name. I believe I blogged about it in the past. It did an amazing job of blocking spam. Unfortunately it viewed googlebots as spam. Now that I think about it, sometimes legitimate comments (by certain users) would disappear also.

        VA:F [1.9.22_1171]
        Rating: 0 (from 0 votes)
        • If you want a good comment spam blocker, Akismet is probably the way to go 🙂

          VN:F [1.9.22_1171]
          Rating: 0 (from 0 votes)

Leave a comment

Your email address will not be published. Required fields are marked *