Start Your Digital Business with Brandz Creation, Get Your Free Marketing Plan
What is Robots.txt & How to Create Robots.txt File?
  • What is robots.txt & How to create robots.txt file?

    What is robots.txt & How to create robots.txt file?

    What is robots.txt File?
    A robots.txt file is a plain text document located in a website’s root directory, serving as a set of instructions used by websites to tell search engines which pages should and should not be crawled.

    A basic robots.txt file looks like this :
    User-Agent: *
    Disallow:
    Sitemap: https://www.example.com/sitemap.xml

    In this article, we'll cover these topics:

    • Why robots.txt file are important for websites?
    • How to create robots.txt file?
    • How to set your user-agent in the robots.txt file?

     

    Why robots.txt file are important for websites?

    The robots.txt file helps manage web crawler activities so you can have more control over the website being crawled.

    Below are three main reasons to use a robots.txt file on a website:

    1. Optimize Crawl Budget

    Crawl budget refers to the number of pages that search engine bots will crawl on your website and index the pages within a given time frame.

    2. Block Duplicate Pages

    Bots don't need to sift through every page on your website. Because not all of them were created to be served in the SERPs.

    Like internal search results pages, duplicate pages, login, and register pages.

    3. Hide Resources

    If you want to exclude resources such as PDFs, videos, images and pages from SERPs to keep them private.

    In that case, the robots.txt file keeps them from being crawled and indexed.

     

    How to create robots.txt file?

    You can use the robots.txt generator tool or create this file yourself.

    Here’s how:

    1. Create a File & Name it robots.txt

    You can also start by opening a .txt document.

    Next, name the document robots.

    The important part of the file is its creation & location. You can create a robots.txt file which can be found on:

    • The root of your domain: www.example.com/robots.txt
    • Your subdomains: page.example.com/robots.txt

    Note : Robots.txt file are not placed in a subdirectory of your domain
    (https://www.example.com/page/robots.txt)

    A basic robots.txt file looks like this:
    User-Agent: *
    Allow: /
    Sitemap : https://www.example.com/sitemap.xml

    2. Add Directives to the Robots.txt File

    This file consists of one or more groups of directives, and each group consists of multiple lines of instructions.

    Each group starts with a word this is called “user-agent”

    • Who the group applies to (user-agent)
    • Which pages or files the bots can access
    • Which pages or files the bots can’t access
    • A sitemap to tell search engine bots which pages and files you deem important

    Note : You can start your robots.txt file with specific user agents first, & then move on to the more general wildcard (*) that matches all crawlers or bots.

    How to set your user-agent in the robots.txt file?

    The next step is how to set the user-agent in robots.txt files. Just so you know – the user-agent pertains to the bots you wish to allow or block. We have listed a few bots & crawlers below, as well as their associations:

    Search Engine Field User-agent
    All General *
    Google General googlebot
    Google Images googlebot-image
    Google Mobile googlebot-mobile
    Google News googlebot-news
    Google Video googlebot-video
    Google eCommerce storebot-google
    Google AdSense mediapartners-google
    Google AdWords adsbot-google
    Yahoo! General slurp
    Yandex General yandex
    Baidu General baiduspider
    Baidu Images baiduspider-image
    Baidu Mobile baiduspider-mobile
    Baidu News baiduspider-news
    Baidu Video baiduspider-video
    Bing General bingbot
    Bing General msnbot
    Bing Images & Video msnbot-media
    Bing Ads adidxbot

    Set Rules to Your Robots.txt File

    A group will specify what the user agent name is and have one rule or directive to indicate which files or pages the user agent can or cannot access.

    Below you can check are the directives used:

    • Disallow : This directive refers to a page or directory relative to your domain that you do not want the user agent to crawl. It will start with a slash (/) followed by the full page URL. You will end it with slash only if it refers to a directory and not a whole page. You can use more than one disallow setting per rule.
    • Allow : This directive refers to a page or directory relative to your domain that you want the user-agent to crawl. It will also start with slash (/) followed by the full page URL. You will end it with slash only if it refers to a directory and not a whole page. You can use more than one allowed setting per rule.
    • Sitemap : This directive is optional for this file. The important rule is that it must be a fully qualified URL. You can use this rule zero or more than one, depending on what is necessary.

    Note : According to Google guidelines, the file size should not exceed 500 kibibytes.

     

    Upload Your Robots.txt File

    You can upload this file to your website’s root directory. For example, if your domain is www.example.com, you will place the file at www.example.com/robots.txt

    Conclusion

    In this blog, we have gone through how to create a robots.txt file. These steps are simple to complete and save you time and headaches from having content on your site crawled without your permission. Create a robots.txt file to instruct search engine crawlers and bots on what to index and what not to index on your website. Contact us now!


    Brandz Creation Logo

    The Brandz Creation Team

    Digital Marketing Experts

    The Brandz Creation team consists of SEO Experts, Content Marketers, Website Designers & Developers and Digital Marketing professionals.

    Go Back

We have the experience and expertise to give your business a competitive edge in the market. We would love to learn more about you and work together towards a profitable. If you want to boost your product sales or increase your brand awareness, Brandz Creation can create the best plan for your business.

Address Info

Sector - 30, Faridabad, Delhi NCR, India | Pincode - 121003