Apr 18 2011

What is a robots.txt file?

Category: Computers & Internet


Robots.txtA robots.txt file is a type of text file, often placed at the root of a website that provides instructions to search engines. It is essentially a file webmasters use to control what information a search engine can access and index. It not only tells a search engine what information it can find, it also (more importantly) has the ability to block or prevent access to areas of a site that are considered private. Robots.txt files can be used to prevent indexing a website entirely, prevent specific pages from being indexed, or to stop a specific search engine robot. The weakness of a robots.txt file is that it is a public file. Anyone can access it and find the parts of your website you don’t want indexed. It should not be used to hide files or information.

Search engine robots, also known as spiders or crawlers, are the programs written and used by search engines (Yahoo, Google, Bing) to explore the web and report their results. Based on those results and a specific algorithm, a searchable index is formed. The robots.txt file will often include instructions for specific SE (search engine) bots or user agents that say what files or folders should be indexed and which ones should be disallowed. See the examples below.

These instructions will tell all search engines robots that the webmaster wishes that no part of a web site should be indexed. The asterisk is a wildcard indicating all.

User-agent: *
Disallow: /

These instructions will let all search engine robots know that all but these specific folders should be crawled and indexed.

User-agent: *
Disallow: /cgi-bin/
Disallow: /stats/
Disallow: /notes/
Disallow: /private/

Tags: , , ,



Challenge this Answer and/or Discuss