In this article I will talk about bots, useful and harmful, so that it will be known which bots can be blocked and which ones should not. Also, in a separate article, I talk about how you should block malicious bots to reduce the load on your site.

I’ll occasionally dig through the logs and look for new ones. Here are just the ones that came across my sites.

Содержание скрыть

1 Useful bots and crawlers

1.1 Amazonbot

1.2 GrapeshotCrawler/2.0.

1.3 Googlebot/2.1 (Googlebot)

1.4 YandexTurbo/1.0

1.5 YandexBot/3.0

1.6 YandexAccessibilityBot/3.0

1.7 YandexMetrika/2.0 and YandexMetrika/3.0, YandexMetrika/4.0

1.8 YandexPartner/3.0

1.9 ias-va/3.1, ias-jp/3.1

1.10 Bingbot

1.11 newspaper/0.2.8

1.12 Mail.RU_Bot/2.0, Mail.RU_Bot/Img/2.0

1.13 vkShare

1.14 facebookexternalhit/1.1 Facebot Twitterbot/1.0

1.15 Mediapartners-Google

1.16 FeedBurner/1.0

1.17 CriteoBot/0.1

2 Bad bots and crawlers

2.1 DotBot

2.2 BLEXBot

2.3 AhrefsBot

2.4 MBCrawler

2.5 YaK/1.0

2.6 niraiya.com/2.0 (Stolen Passwords Checker Bot)

2.7 MegaIndex.ru/2.0

2.8 MJ12bot

2.9 SemrushBot

2.10 Cloudfind

2.11 GetIntent Crawler

2.12 SafeDNSBot

2.13 SeopultContentAnalyzer/1.0

2.14 serpstatbot/2.0

2.15 LinkpadBot

2.16 Slurp

2.17 DataForSeoBot/1.0

2.18 Rome Client (http://tinyurl.com/64t5n)

2.19 Scrapy

2.20 FlipboardRSS

2.21 FlipboardProxy

2.22 Proximic Bot

2.23 ZoominfoBot

2.24 SeznamBot/3.2

2.25 Seekport Crawler

Useful bots and crawlers

This list will have useful bots and crawlers and information about them, I recommend familiarizing yourself with them before you block them. You can also check out the information. A useful bot or crawler for some people may be useless for others.

Amazonbot

It’s hard to call Amazon’s bot unambiguously useful. This crawler is designed to collect information and analyze pages for the Amazon Alexa service. This service acts as a voice assistant and also acts as a voice assistant.

Although Amazonbot crawler can be useful, but often it creates too much load on the server, and also may not obey the directives in the robots.txt file

Identifies itself as: (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)

If a given bot is causing problems, it is best to block it.

GrapeshotCrawler/2.0.

Oracle Data Cloud Crawler is an automated crawler from Oracle . It is needed to analyze page content for advertisers. Used in many real-time bidding (RTB) systems, it is also used in Adsence , so blocking this bot can have a bad effect on your ad revenue.

Identifies itself as: (compatible; GrapeshotCrawler/2.0; +http://www.grapeshot.co.uk/crawler.php).

GrapeshotCrawler/2.0 IP addressrange:

32.145.9.5
132.145.11.125
132.145.14.70
132.145.15.209
132.145.64.33
132.145.66.116
132.145.66.156
132.145.67.248
140.238.81.78
140.238.83.181
140.238.94.137
140.238.95.47
140.238.95.199
152.67.128.219
152.67.137.35
152.67.138.180
148.64.56.64 to 148.64.56.80.
148.64.56.112 to 148.64.56.128.

It is not recommended to block unnecessarily, but if you do not have contextual advertising on the site, it is possible to get rid of this crawler.

Googlebot/2.1 (Googlebot)

Google ‘s search engine robot , performs traversal and indexing of website pages. You can’t block it, as it can have a bad effect on your Google search engine positions.

Identifies itself as: (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

If this robot is putting too much load on the server, you can reduce the scanning frequency(https://support.google.com/webmasters/answer/48620).

YandexTurbo/1.0

Crawler for Yandex Turbo pages, on the site appears only if Turbo pages are connected to it. Bypasses RSS feedfor Turbo Pages.

Identifies itself as: (compatible; YandexTurbo/1.0; +http://yandex.com/bots).

Blocking YandexTurbo/1.0 is not recommended as it may block the display of Turbo pages in Yandex.

YandexBot/3.0

Yandex search crawler, part-time main indexing robot. Performs a page traversal, and collects the data contained therein. Blocking is not recommended, as it can negatively affect the position of the site in the search engine Yandex.

Identifies itself as: (compatible; YandexBot/3.0; +http://yandex.com/bots).

If this crawler creates too much load on the server, you can limit the speed of bypassing it in the settings of Yandex Webmaster.

YandexAccessibilityBot/3.0

Checks if the pages are available to users by downloading them. Blocking is not recommended, as it can have a bad effect on positions in PS Yandex. Bypass speed settings in Yandex Webmaster are ignored.

YandexMetrika/2.0 and YandexMetrika/3.0, YandexMetrika/4.0

Robots Yandex Metrics, appear on the site only when you connect it. YandexMetrika/4.0 downloads styles for Yandex Metrics in order to display them correctly in the Webvisor.

YandexPartner/3.0

Downloads information about the pages of sites connected to the Yandex Partner Network, analyzes the compliance of advertising and content, also monitors the policy of assigning rates on specific pages.

ias-va/3.1, ias-jp/3.1

The search crawler ias-va, as well as ias-jp/3.1 from ADmantX, is used in the AdSense affiliate network, accordingly, it cannot be blocked if you use AdSense on your site. This crawler collects data about the semantics of a website.

Identified as: ias-va/3.1 (+https://www.admantx.com/service-fetcher.html).

ias-jp/3.1 (+https://www.admantx.com/service-fetcher.html).

Bingbot

bingbot/2.0 from a search engine crawler from PS Bing, since I have traffic from Bing from time to time, I can not put it in the bad, the load on the site creates comparableмcomparable to Google and Yandex bots., of course, it’s better not to block it, but if it creates problems and traffic с Bing traffic from Bing, you can block it.

Иidentifies itself as: (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm).

newspaper/0.2.8

Judging by its IP, it belongs to GoogleUserContent , quite possibly collecting content for Google ‘s recommendation systems and GoogleNews. It’s best not to block it.

Mail.RU_Bot/2.0, Mail.RU_Bot/Img/2.0

Indexing crawler of Mail.ru company, accordingly, Mail.RU_Bot/2.0 is a search crawler that traverses the pages of a website and adds them to the search engine index.

Mail.RU_Bot/Img/2.0 is a bot that does image traversal. I haven’t seen any site traffic from there yet, but it’s still best not to block, especially if the site specializes in media content.

vkShare

A bot that comes to a website if a visitor shares a page of any website to the social network Vkontakte through a widget. Takes data like site favicon, image of the page to be unshared, and data like announcement, header.

Identifies itself as: (compatible; vkShare; +http://vk.com/dev/Share).

If vkShare is blocked, then sharing pages in Vkontakte will not work correctly.

facebookexternalhit/1.1 Facebot Twitterbot/1.0

Facebook and Twitter crawlers, which is clear from the name, collect data from your extended descriptions as well as data from pages to display them. There is a suspicion that it also checks content for compliance with “Community Norms”, but this is not certain.

If the necessary correct display of pages when sharing to these social networks, it is better not to block.

Mediapartners-Google

A bot that checks affiliate sites in Google Adsense . Required for proper processing of contextual advertising. If you are an Adsense partner , you cannot block, as this action may reduce your ad revenue.

FeedBurner/1.0

Tool Google. Reads RSSribbons. For what purpose is not entirely clear. Identifies itself as FeedBurner/1.0 (http://www.FeedBurner.com). It is not recommended to block it, but if it creates a heavy load, it can be blocked.

CriteoBot/0.1

Criteo ‘s Crawler . It’s supposed to check the page to see if the content is relevant to your marketing goals. For example, analyze an article for content and then assign it to a certain category.

Identifies itself as: CriteoBot/0.1 (+https://www.criteo.com/criteo-crawler/).

Used by the Advertising Network Yandex, Mail.ru, Yahoo, Rambler so it is better CriteoBot/0.1 not to block.

Bad bots and crawlers

This part will discuss bad bots that should be blocked to reduce the load on the site server. But look carefully too, as some bad bots can be useful specifically for your site.

DotBot

Moz bot, collects statistics about sites for commercial sale for clients of Moz service, this bot can be useful only for those sites that work with Moz via API , otherwise it is an unnecessary load on sites.

Identifies itself as: (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com).

BLEXBot

Collects SEO dataabout a website for commercial sale to customers. Creates unnecessary load and also makes site data transparent to competitors. Blocking is recommended.

Identifies itself as: (compatible; BLEXBot/1.0;).

AhrefsBot

Ahfers , an SEO analyticscompany’s bot, collects data about your website (SEO, linkbuilding, traffic) and then sells it to clients. It is better to block as this data can be useful to your competitors.

Identifies itself as: (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/).

MBCrawler

MBCrawler/1.0 developed MonitorBacklinks , deals with analyzing backlinks and puts a serious strain on websites. Very active, it is recommended to block as it can get a lot of information about backlinks from your site. Identifies itself as: (MBCrawler/1.0 (https://monitorbacklinks.com/robot). Better to block.

YaK/1.0

This is a bot from LinkFluence. It collects data about websites for further commercial use. Accordingly, it can be used by competitors against you. Blocking is recommended.

Identifies itself as: (compatible; YaK/1.0; http://linkfluence.com/; bot@linkfluence.com).

niraiya.com/2.0 (Stolen Passwords Checker Bot)

A stolen password verification bot from Nirariya, the company sells a password manager. Most likely checking the site for password leaks, but the bot is creating unnecessary load. It’s better to block.

Identifies itself as: (compatible; niraiya.com/2.0;)

MegaIndex.ru/2.0

Megaindex.ru bot, collects data about your website, SEO, backlinks , then provides this information on a commercial basis. A bot can be considered malicious if you don’t use it to analyze your own site. Also makes your website data transparent to your competitors.

Identifies itself as: (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler).

MJ12bot

Another SEO analyticsbot, if it started showing up on a site, it will seriously load it. Blocking MJ12bot should be mandatory. This is a Majestic bot , however, if you work on services related to Majestic , you should not block it, for example, if you work on link exchanges like Majestic or Miralinks, Majestic‘s performance is very important there.

SemrushBot

SemrushBot , from the SEO service Semrush, periodically creates a serious load on the site by bypassing it over and over again. The data this bot collects is needed in order to sell it commercially.

Accordingly, the information collected by this bot will be available to competitors, which may not be to your advantage.

Cloudfind

Bot from a company of the same name, looking for affiliate partners for affiliate marketing. More often walks on foreign sites, but occasionally appears in the ru-segment.

GetIntent Crawler

Crowler by GetIntent. Collects data about websites for marketing purposes, such as analyzing prospects for contextual advertising. It is not known with which advertising platforms it cooperates, information on whether this crowler works with AdSense or MSN couldn’t find it, so I decided to block it.

SafeDNSBot

A bot from SafeDNS, the company positions itself as a defense against malicious sites and periodically checks sites for security. The load on the site creates a small load, so you can and not block.

SeopultContentAnalyzer/1.0

The PromoPult (formerly SeoPult) bot collects SEO data ofa website such as backlinks, keywords, etc. Accordingly, the collected data will be analyzed and provided to your competitors on a commercial basis. The SeopultContentAnalyzer/1.0 bot is recommended to be blocked.

serpstatbot/2.0

Bot from the famous Serpstat platform. Constantly analyzes sites for the presence of backlinks. Uses the obtained information for commercial purposes by providing it as part of its service. Accordingly, in addition to additional load on the server, provides more information about your site to competitors. Blocking is recommended.

LinkpadBot

Service Bot LinkPad. LinkpadBot collects information about the link profile of your site for commercial use, accordingly, your competitors can get data about the links you place on the site, and satellite site grids will be discredited. It is better to block this bot.

Slurp

Yahoo! Search Crawler is not noticed for special lawlessness, but there is practically no traffic from it in CIS, so Slurp Bot will not be of much use. Better block it, for sometimes it starts actively circumventing sites.

If the site is aimed at an overseas audience, it’s best to leave it.

DataForSeoBot/1.0

Bot service DataForSeo, is engaged in checking backlinks and analyzing the site for further use for commercial purposes, for example, to provide SEO-data ofyour site to competitors.

There is no use of DataForSeoBot/1.0, it is better to block it.

Rome Client (http://tinyurl.com/64t5n)

It is not known what kind of crowler this Rome Client is, I have not found any information about it. Judging by the IP, the requests are coming from Amazon AWS. It is oriented exactly on the Feed of the site, it is quite possible that it uploads it for its own purposes. Since it is unknown what kind of bot it is and what purposes it has, it is better to block it.

Scrapy

The Scrapy bot is designed to bypass open source sites and pull data from them. Why? Goals can be different, both good and bad. In general, this bot is best blocked.

FlipboardRSS

Flipboard platform bot , takes your RSS feedfor publishing. Generally not malicious, even necessary if you publish your content on Flipboard, the problems is that your RSS feedon this service can be published by anyone. You won’t get traffic from there, but you will get periodic bot traversal.

FlipboardProxy

Also from FlipBoard, actually checks your site and also analyzes how it looks. It is needed to display materials on Flipboard . If there is no traffic from this service, you can block the bot.

Proximic Bot

Sometimes you can see this bot in the log, identifies itself as: (compatible; proximic; +https://www.comscore.com/Web-Crawler).

Engaged in matching content and contextual advertising compliance. Whether it works with AdSense or RFE is unknown, accordingly, I can not attribute it to useful, it is quite possible that this bot simply collects information for projects and “trains” on third-party sites to more accurately determine the subject of data in different languages.

ZoominfoBot

The only data in the identification string is: (zoominfobot at zoominfo dot com). Gathers only business information from the site, usually pulls the entire site feed. It is practically useless for the Russian-speaking audience.

Collects information for commercial purposes, to aggregate and make available to its users on a commercial basis. It’s better to block.

SeznamBot/3.2

Crowler of the Czech search engine Seznam. If your site isn’t in Czech, it’s probably of no use. Yes and there are no visitors on this “search engine”. In general, in the entire life of my site has not seen a single visitor from there, accordingly, I consider this bot harmful and recommend blocking it.

Seekport Crawler

Crowler of another “underdog”. Traffic from it is not seen, the prospects for this search engine is also not, there is almost no information about it.

Identifies itself as: (compatible; Seekport Crawler; http://seekport.com/).

I think there is little point in letting their crowler into your site, there are few prospects, especially for CIS.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

If the materials from this website have been helpful, and you wish to support the blog, you can use the form at the following link: Donate to support the blog