Search engines like Google use these web crawlers, sometimes called web robots, to archive and categorize websites. Mosts bots are configured to search for a robots.txt file on the server before it reads any other file from the website. It does this to see if a website’s owner has some special instructions on how to crawl and index their site.
The robots.txt file contains a set of instructions that request the bot to ignore specific files or directories. This may be for the purpose of privacy or because the website owner believes that the contents of those files and directories is irrelevant to the categorization of the website in search engines.
If a website has more than one subdomain, each subdomain must have its own robots.txt file. It is important to note that not all bots will honor a robots.txt file. Some malicious bots will even read the robots.txt file to find which files and directories they should target first. Also, even if a robots.txt file instructs bots to ignore a specific pages on the site, those pages may still appear in search results of they are linked to by other pages that are crawled.
« Back to Glossary Index