Why are Pages designated in my Robots.txt File Appearing in Google Search Engine Results?

Occasionally, things happen online that we don’t understand…thinking something has been done when we explicitly said we didn’t want done.

One of these instances has to do with the Robots.txt file, which we discussed a couple of months ago. Robots.txt is a simple text file that webmasters put in the root directory of a website to instruct a search engine to not crawl a given webpage.

(Read our search engine optimization blog post from Sept. 10th to find out why you would want to do that)

But sometimes, those pages we instructed Google not to crawl appear in a search engine results page. How did that happen? I thought I told Google I didn’t want that page crawled?

It’s easy to spot one of these…the listing will only have a link to the page and will not include any kind of description.

As Matt Cutts, a software engineer and head of Google’s Webspam team explains, Google always honors the request in your Robots.txt file…the feature has been around for years and no bugs have caused it to malfunction in quite a long time now.

Rather, Google will include a page they haven’t crawled in their search results if there are other sites containing relevant keyword anchor text links pointing to it. Lots of links to a page or website indicates to Google that the page is pretty important – therefore, it’s possibly very valuable to the user.

You can include a no index meta tag at the top of any page you do not want Google to include in a search engine results page. The page will be crawled but once the spider sees this tag, it will drop the page from any search results.

Watch the video below to learn more.

Tags: , ,

Leave a Reply