What is robots.txt & How to create robots.txt file?
March 19, 2024 Search Engine OptimizationWhat is robots.txt File?
A robots.txt file is a plain text document located in a website’s root directory, serving as a set of instructions used by websites to tell search engines which pages should and should not be crawled.
A basic robots.txt file looks like this :
User-Agent: *
Disallow:
Sitemap: https://www.example.com/sitemap.xml
In this article, we'll cover these topics:
Why robots.txt file are important for websites?
The robots.txt file helps manage web crawler activities so you can have more control over the website being crawled.
Below are three main reasons to use a robots.txt file on a website:
1. Optimize Crawl Budget
Crawl budget refers to the number of pages that search engine bots will crawl on your website and index the pages within a given time frame.
2. Block Duplicate Pages
Bots don't need to sift through every page on your website. Because not all of them were created to be served in the SERPs.
Like internal search results pages, duplicate pages, login, and register pages.
3. Hide Resources
If you want to exclude resources such as PDFs, videos, images and pages from SERPs to keep them private.
In that case, the robots.txt file keeps them from being crawled and indexed.
How to create robots.txt file?
You can use the robots.txt generator tool or create this file yourself.
Here’s how:
1. Create a File & Name it robots.txt
You can also start by opening a .txt document.
Next, name the document robots.
The important part of the file is its creation & location. You can create a robots.txt file which can be found on:
Note : Robots.txt file are not placed in a subdirectory of your domain
(https://www.example.com/page/robots.txt)
A basic robots.txt file looks like this:
User-Agent: *
Allow: /
Sitemap : https://www.example.com/sitemap.xml
2. Add Directives to the Robots.txt File
This file consists of one or more groups of directives, and each group consists of multiple lines of instructions.
Each group starts with a word this is called “user-agent”
Note : You can start your robots.txt file with specific user agents first, & then move on to the more general wildcard (*) that matches all crawlers or bots.
How to set your user-agent in the robots.txt file?
The next step is how to set the user-agent in robots.txt files. Just so you know – the user-agent pertains to the bots you wish to allow or block. We have listed a few bots & crawlers below, as well as their associations:
Search Engine | Field | User-agent |
---|---|---|
All | General | * |
General | googlebot | |
Images | googlebot-image | |
Mobile | googlebot-mobile | |
News | googlebot-news | |
Video | googlebot-video | |
eCommerce | storebot-google | |
AdSense | mediapartners-google | |
AdWords | adsbot-google | |
Yahoo! | General | slurp |
Yandex | General | yandex |
Baidu | General | baiduspider |
Baidu | Images | baiduspider-image |
Baidu | Mobile | baiduspider-mobile |
Baidu | News | baiduspider-news |
Baidu | Video | baiduspider-video |
Bing | General | bingbot |
Bing | General | msnbot |
Bing | Images & Video | msnbot-media |
Bing | Ads | adidxbot |
Set Rules to Your Robots.txt File
A group will specify what the user agent name is and have one rule or directive to indicate which files or pages the user agent can or cannot access.
Below you can check are the directives used:
Note : According to Google guidelines, the file size should not exceed 500 kibibytes.
Upload Your Robots.txt File
You can upload this file to your website’s root directory. For example, if your domain is www.example.com, you will place the file at www.example.com/robots.txt
Conclusion
In this blog, we have gone through how to create a robots.txt file. These steps are simple to complete and save you time and headaches from having content on your site crawled without your permission. Create a robots.txt file to instruct search engine crawlers and bots on what to index and what not to index on your website. Contact us now!