robots.txt Tutorial - Usage
This file is used to exclude robots from sections of your web site,
so they won't read files in those areas.
1. What are these robots?
These are mostly automated software which fetches content on many web sites for
a variety of purposes.
Search engines often call these spiders and send them out to
look for pages to include in their search results.
Some spammers also use this technology to harvest email addresses to send
their junk mail to. Other uses include bots looking for illegal files or content.
2. How do I create a robots.txt file?
The syntax is very limited and easy to understand. The first part specifies the robot
we are referring to.
Replace BotName with the robot name in question. To address
all of them, simply use an asterisk.
The second part tells the robot in question not to enter certain parts of your web site.
In this example, any path on our site starting with the string /cgi-bin/ is declared off limits.
Multiple paths can be excluded per robot by using several Disallow lines.
User-agent: *
Disallow: /cgi-bin/
Disallow: /temp/
Disallow: /private
|
|
This robots.txt file would apply to all bots and instruct them to stay out of directories
/cgi-bin/ and /temp/.
It also tells them any path/URL on your site starting with /private (files and directories) is off limits.
To declare your entire website off limits to BotName, use the example shown below.
User-agent: BotName
Disallow: /
|
|
To have a generic robots.txt file which welcomes every robot and does not restrict them,
use this sample.
This beginner's tutorial includes a list of common robot names to get you started. Many others exist.