This strange sounding name isn’t some alternate website personality.
A robots.txt file is a simple text file placed in the root directory of a website that is used to provide instructions to a search engine spider that crawls and indexes your website.
Specifically, the file tells search engine spiders, which are actually computer programs, which pages NOT to index.
So why would you not want some of your pages crawled and indexed?
Well these computer programs have limited time and resources. You want them to spend their time indexing the high value pages on your site – ones with important content, product listings and sales pages.
Pages containing a shopping cart checkout for example are not that important so you do not want the spider to waste valuable resources and time indexing that. Anything in your cgi-bin folder and directories containing images or sensitive company info shouldn’t be indexed either.
That’s another important function of a robots.txt file – it helps protect your site from hackers. Search engine spiders will crawl and index just about anything it can get its hands on, including sensitive places like password files.
One more very important thing about your robots.txt file – adding the following two items (User-agent: * and Disallow: /) to your file can prevent all search engines from ever indexing any of your site. The asterisk is a generic symbol for all and the forward slash in the disallow command indicates the root directory, meaning everything you have.
To prevent only certain places on your site from being crawled and indexed, spell them out in the “disallow” line (i.e. Disallow: /cgi-bin/)
Of course, if you want every webpage in your site crawled and indexed, there is no need for a robots.txt file.










