Recently Google Webmaster Tools tells me that it is unable to crawl my site because my robots.txt file is unreachable. Having a robots.txt file registered in Google Webmaster Tools that cannot be accessed is worse than not having a robots.txt file at all.
A robots.txt file tells search engines whether they are allowed to crawl your website or not. Notice that I say tells search engines, rather than enables or disallows? This is because it’s more of a suggestion than a technical restriction or mandate, although search engines do usually adhere to this. Robots.txt files also tell search engines the location of sitemaps.
Having a robots.txt file that Google knows about but can’t access will make Google unable to crawl your website because there are some unreachable instructions about the website; if Google didn’t know about the robots.txt file, then it would just crawl the site as normal, assuming that the webmaster has no special privacy requirements about accessing their site.
Enabling Google To Access The Robots.txt file and Crawl My Site
As mentioned before, I was getting the following error:
Google Couldn’t Crawl Your Site Because We Were Unable To Access The Robots.txt File
Other area’s of Google Webmaster Tools were stating this error:
- Obviously the first thing I did was to see if robots.txt existed. Since I use WordPress, I couldn’t access the robots.txt file from the root of my website’s home directory because it’s dynamically created in the PHP, so I simply went to the robots.txt URL to see if it was available.
- The robots.txt file wasn’t unreachable for me and it was displaying the correct content, so I checked that the .htaccess file that was sitting in the root of my site’s home directory was correct, which it was.
- As WordPress was dynamically generating the robots.txt file, I couldn’t check the file permissions on it, but this would have been my next step. Instead, the next thing I did was contact my hosting provider, who looked into it and realized that they were actually blocking googlebot’s IP.
This was fixed, which I confirmed by using Google Webmaster Tool’s Fetch as Google feature to access the robots.txt file. I then resubmitted my sitemap to Google and my website was available on Google two days later.