What is Robots.txt file in SEO

What is Robots.txt file in SEO.


What is Robots.txt file in SEO.

Its a group of web standards that regulate the behavior of web robot and web indexing .

It consist of the following ,

1: The original REP from 1994, extended 1997, defining crawler directives for robots.txt. Some search engines support extensions like URI patterns (wild cards).

2: Its extension from 1996 defining indexer directives (REP tags) for use in the robots meta element, also known as “robots meta tag.” Meanwhile, search engines support additional REP tags with an X-Robots-Tag. Webmasters can apply REP tags in the HTTP header of non-HTML resources like PDF documents or images.

3: The Microformat rel-nofollow from 2005 defining how search engines should handle links where the A Element’s REL attribute contains the value “nofollow.”

What is a Robots.txt File?

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.

Structures of Robot text file.

1: The structure of a robots.txt is pretty simple (and barely flexible) – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:

2: User-agent:*
3: Disallow:/temp/
4: User-agent : google bot
5: # All user agents are disallowed to see the /temp directory
6: User-agent
7: Disallow: /temp/.

Important Rules.

In most cases, meta robots with parameters “noindex, follow” should be employed as a way to to restrict crawling or indexation.

It is important to note that malicious crawlers are likely to completely ignore robots.txt and as such, this protocol does not make a good security mechanism.

Only one “Disallow:” line is allowed for each URL.

Each subdomain on a root domain uses separate robots.txt files.

Google and Bing accept two specific regular expression characters for pattern exclusion (* and $).

The filename of robots.txt is case sensitive. Use “robots.txt”, not “Robots.TXT.”

Spacing is not an accepted way to separate query parameters. For example, “/category/ /product page” would not be honored by robots.txt.

SEO Best Practice

1: Blocking page

2: Block with robot . text

3: Block with meta nolndex

4: Block by no following links.

The Traps of a Robots.txt File

The more serious problem is with logical errors. For instance:

1: User-agent: *

2: Disallow: /temp/

3: User-agent: Googlebot

4: Disallow: /images/

5; Disallow: /temp/

6: Disallow: /cgi-bin/.