What Is A Robots.txt File? And How Do You Create One? (Beginner’s Guide)

Share this:

Robots.txt is a file that puts you in charge of handling the crawlers and indexers of your sites and pages. It is a fulfillment of the REP-Robots Exclusion Protocol. This file is located in the root directory of your site. Even if it might look unnecessary, it guides how search engines view your site. Its correct usage can enhance the speed of your site and boost its SEO.

This post is to familiarize you with the Robots.txt file and its making. You also get the knowledge of its usage as well as loopholes to watch for. Sometimes ago when the internet had just arrived, software engineers came up with robots for indexing web pages. These robots could be on pages that the owners would rather have unindexed.

The REP refers to instructions that all robots must follow. However, some robots do not adhere to these instructions while others need a valid reason to adhere to instructions.

The following url can show you the robots.txt of any website:

http://[website_domain]/robots.txt

Use of Robots.txt

Keep in mind that your site can function properly without the Robots.txt but it doesn’t hurt to have additional benefits.

  • It reduces the indexing power of bots on private folders.
  • It maximizes your site’s resources. Use robots.txt to restrict access to unimportant files so that your bandwidth and server resources last longer.
  • It helps to give important pages the utmost attention.

How to find your Robots.txt file

This file is located stored in the root directory of your website. In your FTP tool, you should click on public_html and you find the site’s directory there. Your Robots.txt file can be viewed with a text editor. It’s very likely that you might not find your file here. In such a case, you should make yours.

How to create a Robot.txt file

Using a text editor, you open and save an empty file with the name; robots.txt. The next is to put it up on your server. You will use the FTP tool to access your server. From there you open the public_html folder and finally the root directory of your site before putting your robots.txt in it.

Now that you have a Robots.txt file, let’s go to its usefulness.

How to use Robots.txt

You only need to make a change of permissions in Robots.txt whenever you desire to block search engines from viewing your site. You are also able to stop the indexing of your contact page by Bing. The Robots.txt has no significant power on the SEO. It can see to the speed control of your site when it is in charge of crawlers.

The following are commands for your Robots.txt file.

1. Restrict every bot from your site

Include this code in your Robots.txt file:

User-agent: *

Disallow: /

This command basically instructs bots not to go near anything on your site.

2. Restrict every bot’s access to a particular folder.

Here is the command:

User-agent: *

Disallow: /[folder_ name]/

Use this command to restrict crawlers from your highly useful folder.

3. Restrict certain bots from your site

Use this command:

User-agent: [robot name]

Disallow: /

This is to have beneficial bots on your site and do away with non-beneficial ones.

4. Restrict a certain file from being crawled

The Robots Exclusion Protocol puts you in charge of restricting the access of robots to files and folders as desired.

Use this command:

User-agent: *

Disallow: /[folder_name]/[file_name.extension]

5. Restrict access to a folder but leave a file to be indexed

The “Disallow” command blocks bots from accessing a folder or a file

The “Allow” command does the opposite.

You can enable bots to reach a file within a folder but not the whole folder.

Use this command:

User-agent: *

Disallow: /[folder_name]/

Allow: /[folder_name]/[file_name.extension]/

You can also block every page with a particular matching extension from being indexed.

Here’s the command:

User-agent: *

Disallow: /*.extension$

The ($) sign here signifies the end of the URL, i.e. the extension is the last string in the URL.

This is highly useful for preventing bots’ reach to your scripts.

6. Stop bots’ constant reach to your site. This delays crawl request to prevent overload of several bots’ request on your site.

Common mistakes to avoid when using Robots.txt

The wrong application of Robots.txt mostly results in SEO mishaps. This is coupled with the fact that a lot of people are misinformed about the topic. Below are the top mishaps to watch out for:

1. Use of Robots.txt to prevent content from being indexed

Bonafide bots will never reach a disallowed folder. This however leads to two points:

  • Bots from outside sources will reach the content of your folder.
  • Rogue bots will ditch instructions and reach your content for indexing regardless.

The meta non-index tag is what you can use to prevent this. Simply include the tag in pages you want to leave unindexed.

<meta name=”robots” content=”noindex”>

This method matches SEO recommendations even though it’s not 100% effective. An alternative is if you use an SEO tool that’s a WordPress extension. Then you won’t have to worry about code editing. As a plus, Google has put a halt on the use of noindex since the Ist of September.

2. Using Robots.txt to protect private content

Simply placing a restriction on your directory with Robots.txt does not guarantee an all round effectiveness. Bots from outside sources are still able to reach your content and have it indexed.

A strategy that can work better for you is the use of login to keep your private content. It is assured that no bot of any kind is able to reach your content. The security is guaranteed but it only means more work for your site’s guests.

3. Using Robots.txt to stop duplicate content from getting indexed

SEO frowns a lot at duplicate content. This use of Robots.txt does not seem to be what to do. It is not assured that bots from outside sources will not reach it.

  • You can choose to eliminate duplicate content which I do not recommend as it takes visitors to 404 pages.
  • You can use 301 redirect which simply informs visitors of your move. It then redirects them to your original content. This is easier with a WordPress SEO plugin. You should navigate carefully with these methods because of the overall effect on your SEO.

Conclusion

The Robots.txt file remains your go-to tool for guiding the interaction of bots and search engine crawlers with your site. The appropriate usage of the Robots.txt improves the ranking of your site as well as makes navigation easier. Take advantage of this guide to equip yourself with the knowledge about Robots.txt, its installation and usage. Lastly, be careful to stay clear of the previously mentioned mistakes.