robots.txt Tutorial - Robot List
Several different types are always exploring the world wide web.
The best known ones are classified here.
1. Search Engine Robots
Below is a list of the the most popular and active search engine bots (also called spiders) at this time.
Robot Name | Search Engine |
Googlebot | Google |
Googlebot-Image | Google Images |
Slurp | Inktomi |
ZyBorg | WiseNut/LookSmart |
fast | Fast/AllTheWeb |
Openbot | OpenFind |
Scooter | Alta Vista |
|
By excluding any part of your site from these, you will also exclude that
part to show up in search results.
2. Bots Used By Spammers
Unless you enjoy receiving lots of SPAM, you don't want these on your web site.
They look for email addresses on web pages to send their junk email to.
Start Of User-Agent String |
EmailSiphon |
EmailWolf |
ExtractorPro |
CherryPicker |
NICErsPRO |
Teleport |
EmailCollector |
|
These will ignore the robots.txt file as they want to find new email addresses by any means possible.
There is a way to refuse them access to your site.
3. Others
These claim to respect the robots.txt file and you can block them (if you wish) by robot name, as usual.
Robot Name | Purpose |
TurnitinBot | Detects Plagiarism |
NPBot | Intellectual Property |
|
This list has to be banned from your web site hosting account in
another way, if you wish to do so.
User-Agent Start | Purpose |
LinkWalker | Link Directory Builder |
Zeus | Link Directory Builder |
|
These robots look for reciprocal link partners. If you're interested in that
type of venture, do not block them.
|