In the vigorous chase after traffic and ranking, it may not always be obvious that there are situations when you might not want the search engine crawling your website. There are admin folders where secure information is stored, for example, and other parts of your site that you don’t wish accessed at all.
How Does The robots.txt File Work?
Essentially, what you do is plop it in your root directory and tell the web engine crawlers which pages you wish them not to visit. It is not a block as such, and is not compelling, but most search engine crawlers will politely follow this request.
Main Advantages of Using robots.txt
What if you have nothing you wish to block the crawlers from, is there still good reason to use this file? You bet. It serves many other functions that can positively influence your on page optimization and ranking.
Here’ are two common examples:
- Pages Get Indexed Faster. When your neighborhood crawler doesn’t waste time on non-essential content, your other pages get ranked sooner.
- No Duplicate Content Penalty. When you have a large website, there’s a chance for duplicate copy in there somewhere, such as content for printing and the same content for viewing. The search engine might classify this as duplicate content, unless you tell it not to index it as part of your site.
So How Do You Set Up A robots.txt File?
First of all, as the extension implies, all it is, is a “.txt” file. You can create one on your desktop. Call it “robots.txt” (don’t use caps, it’s case sensitive – calling it “Robots.TXT” won’t work) and upload it into your website’s root directory.
This is also important. The crawler will always look for your robots.txt file first, but it will only search for it in the root directory. If it’s not there, it will not be used even if the crawler does find it elsewhere later.
The file content is simple, and you don’t need to know code to write it. It consists of user agents and disallowed commands. That’s how simple it is.
“User-agent:” indicates search engines’ crawlers or bots
“Disallow:” indicates files or directories you wish to define as “off limits”.
Examples of How To Use robots.txt
- Disallow All Search Engines from Indexing the Whole Domain
In this example “*” means “all” and “/” is the root directory (and everything in it). Useful when you have the site online, but it’s still not finished. Some testing and structuring needs to be done, so the site is live, but you don’t want it to be indexed yet, in this incomplete form.
- Exclude a Specific Robot from the Site
Here an agent called “Googlebot-Image” is disallowed from the site. Add a more specific location after “/” to indicate parts of the site you’d like to disallow specifically.
Now you can go and customize this file, disallowing parts of the site from all users, or the entire the site from some of them, in whichever combination suits your current needs.
To finish this tutorial, here’s an example of a robots.txt code addressing multiple user-agents and giving specific “disallow” instructions:
In this example, all (“*”) user agents are restricted from a list of 5 specific site sections, and two specific crawlers (“MJ12bot” and “ShopWiki”) are politely asked from the site entirely.
We hope you found this information helpful. For more advice or to get the help of a professional Toronto SEO agency, contact us now. We are here for you.