|
How to write robots.txt First, take a look at the basic example of how to write robots. Write in the txt file in the format shown above. The explanation for each description is as follows. User-Agent The description User-Agent means the corresponding search robot. If you use you can specify all search engine robots. If you want to specify only Google's crawler, write "User-Agent:googlebot", and if you want to specify other specific crawlers, write the corresponding description. Applies to all crawlers User-agent: Googlebot →Applies only to Google crawlers User-agent: bingbot → Applies to Bing search crawlers please refer to the following.
Google crawler overview (user agents) – Bing Webmaster Tools Twitter Developers Oman Phone Number Data Facebook crawler Disallow The description Disallow is used to deny access . As in the example above, if nothing is written afterindex_sitemap1.xml *Sitemap.xml file names other than "sitemap.xml" also work. Allow Although not shown in the example, you can request permission to access by writing the word Allow. However, since not writing anything means permission, there are probably few opportunities to use this specification. Allow is used when you want to crawl only some pages in a directory Now that you know how to create robots.txt, we will explain three points to keep in mind when actually creating it.

While there are benefits, there are also risks, so be sure to take precautions. 3 points to be careful of 01: Do not use crawl denial for noindex purposes. First, deny crawling should not be used for noindex purposes. This is because crawl denial is just a role to deny crawling, and it does not mean that the index can prevent it. By specifying disallow in robots.txt, you can control crawl access, so basically it will not be indexed. However, if another page has a link to the disallowed page, it may be indexed. For pages that you absolutely do not want to be indexed, use noindex in the head as shown below. Do not deny crawling of all pages The second thing to keep in mind is to avoid denying all pages to crawl.
|
|