Advertisement
lyfsy

Pinterestbot Flood

Jan 23rd, 2020
423
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 15.00 KB | None | 0 0
  1. Pinterestbot Flood
  2. Anybody else seeing a flood of traffic from a User Agent identifying itself as Pinterestbot?
  3. ++++++++++++++
  4. list of top cheapest host http://Listfreetop.pw
  5.  
  6. Top 200 best traffic exchange sites http://Listfreetop.pw
  7.  
  8. free link exchange sites list http://Listfreetop.pw
  9. list of top ptc sites
  10. list of top ptp sites
  11. Listfreetop.pw
  12. Listfreetop.pw
  13. +++++++++++++++
  14.  
  15. It seems all of our servers are just absolutely being flooded with request from this bot.
  16.  
  17. Is this one of those bad behaving bots?
  18.  
  19. How is everybody else combating this? Or am I the only that seems to be affected by this.
  20. Have seen it around but not flooding of requests per say. https://help.pinterest.com/en/busine...terest-crawler for more information on at least the real bot for them.
  21. If a site receives a large number of requests from bots of a certain type, in your case it is a Pinterest bot.
  22. We recommend blocking some bots that have proven to be malicious on the network, as well as the pinterest bot that causes you problems.
  23. do the following to lock:
  24. Step 1 - find the exact name of the bot.
  25. - To block the Pinterest bot, you need to know the exact name of the bot, you need to know the exact name of the bot. You can find it in the access logs of the site in the field "user_agent".
  26. After you know the name of the bot (for example, we will take the name: PinterestCrawler)
  27. in the code below, change the line RewriteCond %{HTTP_USER_AGENT} "^PinterestCrawler" [NC,OR] (replacing PinterestCrawler with the name of the bot that you found in the access log..)
  28.  
  29. domain keys identified mail
  30. hosting 2 domains
  31. sellmehits.com
  32. highclasshits.com
  33. tendanceaumasculin.fr
  34. btchoopla.com
  35. virtono.com
  36.  
  37. Step 2 - Block Bots.
  38. - Log in to your account via FTP or SSH..
  39. - Create a .htaccess file in the root directory of the site.
  40. - Add the following to the lock file. This will block malicious bots and reduce the load on the server created by the site in the future..
  41.  
  42. RewriteEngine on
  43. # Start bot locks
  44. # Change one line below to block Pinterest bot.
  45. RewriteCond %{HTTP_USER_AGENT} "^PinterestCrawler" [NC,OR]
  46. RewriteCond %{HTTP_USER_AGENT} "^Mozilla.*Indy" [NC,OR]
  47. RewriteCond %{HTTP_USER_AGENT} "^Mozilla.*NEWT" [NC,OR]
  48. RewriteCond %{HTTP_USER_AGENT} "^$" [NC,OR]
  49. RewriteCond %{HTTP_USER_AGENT} "^Maxthon$" [NC,OR]
  50. RewriteCond %{HTTP_USER_AGENT} "^SeaMonkey$" [NC,OR]
  51. RewriteCond %{HTTP_USER_AGENT} "^Acunetix" [NC,OR]
  52. RewriteCond %{HTTP_USER_AGENT} "^binlar" [NC,OR]
  53. RewriteCond %{HTTP_USER_AGENT} "^BlackWidow" [NC,OR]
  54. RewriteCond %{HTTP_USER_AGENT} "^Bolt 0" [NC,OR]
  55. RewriteCond %{HTTP_USER_AGENT} "^BOT for JCE" [NC,OR]
  56. RewriteCond %{HTTP_USER_AGENT} "^Bot mailto\:craftbot@yahoo\.com" [NC,OR]
  57. RewriteCond %{HTTP_USER_AGENT} "^casper" [NC,OR]
  58. RewriteCond %{HTTP_USER_AGENT} "^checkprivacy" [NC,OR]
  59. RewriteCond %{HTTP_USER_AGENT} "^ChinaClaw" [NC,OR]
  60. RewriteCond %{HTTP_USER_AGENT} "^clshttp" [NC,OR]
  61. RewriteCond %{HTTP_USER_AGENT} "^cmsworldmap" [NC,OR]
  62. RewriteCond %{HTTP_USER_AGENT} "^Custo" [NC,OR]
  63. RewriteCond %{HTTP_USER_AGENT} "^Default Browser 0" [NC,OR]
  64. RewriteCond %{HTTP_USER_AGENT} "^diavol" [NC,OR]
  65. RewriteCond %{HTTP_USER_AGENT} "^DIIbot" [NC,OR]
  66. RewriteCond %{HTTP_USER_AGENT} "^DISCo" [NC,OR]
  67. RewriteCond %{HTTP_USER_AGENT} "^dotbot" [NC,OR]
  68. RewriteCond %{HTTP_USER_AGENT} "^Download Demon" [NC,OR]
  69. RewriteCond %{HTTP_USER_AGENT} "^eCatch" [NC,OR]
  70. RewriteCond %{HTTP_USER_AGENT} "^EirGrabber" [NC,OR]
  71. RewriteCond %{HTTP_USER_AGENT} "^EmailCollector" [NC,OR]
  72. RewriteCond %{HTTP_USER_AGENT} "^EmailSiphon" [NC,OR]
  73. RewriteCond %{HTTP_USER_AGENT} "^EmailWolf" [NC,OR]
  74. RewriteCond %{HTTP_USER_AGENT} "^Express WebPictures" [NC,OR]
  75. RewriteCond %{HTTP_USER_AGENT} "^extract" [NC,OR]
  76. RewriteCond %{HTTP_USER_AGENT} "^ExtractorPro" [NC,OR]
  77. RewriteCond %{HTTP_USER_AGENT} "^EyeNetIE" [NC,OR]
  78. RewriteCond %{HTTP_USER_AGENT} "^feedfinder" [NC,OR]
  79. RewriteCond %{HTTP_USER_AGENT} "^FHscan" [NC,OR]
  80. RewriteCond %{HTTP_USER_AGENT} "^FlashGet" [NC,OR]
  81. RewriteCond %{HTTP_USER_AGENT} "^flicky" [NC,OR]
  82. RewriteCond %{HTTP_USER_AGENT} "^g00g1e" [NC,OR]
  83. RewriteCond %{HTTP_USER_AGENT} "^GetRight" [NC,OR]
  84. RewriteCond %{HTTP_USER_AGENT} "^GetWeb\!" [NC,OR]
  85. RewriteCond %{HTTP_USER_AGENT} "^Go\!Zilla" [NC,OR]
  86. RewriteCond %{HTTP_USER_AGENT} "^Go\-Ahead\-Got\-It" [NC,OR]
  87. RewriteCond %{HTTP_USER_AGENT} "^grab" [NC,OR]
  88. RewriteCond %{HTTP_USER_AGENT} "^GrabNet" [NC,OR]
  89. RewriteCond %{HTTP_USER_AGENT} "^Grafula" [NC,OR]
  90. RewriteCond %{HTTP_USER_AGENT} "^harvest" [NC,OR]
  91. RewriteCond %{HTTP_USER_AGENT} "^HMView" [NC,OR]
  92. RewriteCond %{HTTP_USER_AGENT} "^Image Stripper" [NC,OR]
  93. RewriteCond %{HTTP_USER_AGENT} "^Image Sucker" [NC,OR]
  94. RewriteCond %{HTTP_USER_AGENT} "^InterGET" [NC,OR]
  95. RewriteCond %{HTTP_USER_AGENT} "^Internet Ninja" [NC,OR]
  96. RewriteCond %{HTTP_USER_AGENT} "^InternetSeer\.com" [NC,OR]
  97. RewriteCond %{HTTP_USER_AGENT} "^jakarta" [NC,OR]
  98. RewriteCond %{HTTP_USER_AGENT} "^Java" [NC,OR]
  99. RewriteCond %{HTTP_USER_AGENT} "^JetCar" [NC,OR]
  100. RewriteCond %{HTTP_USER_AGENT} "^JOC Web Spider" [NC,OR]
  101. RewriteCond %{HTTP_USER_AGENT} "^kanagawa" [NC,OR]
  102. RewriteCond %{HTTP_USER_AGENT} "^kmccrew" [NC,OR]
  103. RewriteCond %{HTTP_USER_AGENT} "^larbin" [NC,OR]
  104. RewriteCond %{HTTP_USER_AGENT} "^LeechFTP" [NC,OR]
  105. RewriteCond %{HTTP_USER_AGENT} "^libwww" [NC,OR]
  106. RewriteCond %{HTTP_USER_AGENT} "^Mass Downloader" [NC,OR]
  107. RewriteCond %{HTTP_USER_AGENT} "^microsoft\.url" [NC,OR]
  108. RewriteCond %{HTTP_USER_AGENT} "^MIDown tool" [NC,OR]
  109. RewriteCond %{HTTP_USER_AGENT} "^miner" [NC,OR]
  110. RewriteCond %{HTTP_USER_AGENT} "^Mister PiX" [NC,OR]
  111. RewriteCond %{HTTP_USER_AGENT} "^MSFrontPage" [NC,OR]
  112. RewriteCond %{HTTP_USER_AGENT} "^Navroad" [NC,OR]
  113. RewriteCond %{HTTP_USER_AGENT} "^NearSite" [NC,OR]
  114. RewriteCond %{HTTP_USER_AGENT} "^Net Vampire" [NC,OR]
  115. RewriteCond %{HTTP_USER_AGENT} "^NetAnts" [NC,OR]
  116. RewriteCond %{HTTP_USER_AGENT} "^NetSpider" [NC,OR]
  117. RewriteCond %{HTTP_USER_AGENT} "^NetZIP" [NC,OR]
  118. RewriteCond %{HTTP_USER_AGENT} "^nutch" [NC,OR]
  119. RewriteCond %{HTTP_USER_AGENT} "^Octopus" [NC,OR]
  120. RewriteCond %{HTTP_USER_AGENT} "^Offline Explorer" [NC,OR]
  121. RewriteCond %{HTTP_USER_AGENT} "^Offline Navigator" [NC,OR]
  122. RewriteCond %{HTTP_USER_AGENT} "^PageGrabber" [NC,OR]
  123. RewriteCond %{HTTP_USER_AGENT} "^Papa Foto" [NC,OR]
  124. RewriteCond %{HTTP_USER_AGENT} "^pavuk" [NC,OR]
  125. RewriteCond %{HTTP_USER_AGENT} "^pcBrowser" [NC,OR]
  126. RewriteCond %{HTTP_USER_AGENT} "^PeoplePal" [NC,OR]
  127. RewriteCond %{HTTP_USER_AGENT} "^planetwork" [NC,OR]
  128. RewriteCond %{HTTP_USER_AGENT} "^psbot" [NC,OR]
  129. RewriteCond %{HTTP_USER_AGENT} "^purebot" [NC,OR]
  130. RewriteCond %{HTTP_USER_AGENT} "^pycurl" [NC,OR]
  131. RewriteCond %{HTTP_USER_AGENT} "^RealDownload" [NC,OR]
  132. RewriteCond %{HTTP_USER_AGENT} "^ReGet" [NC,OR]
  133. RewriteCond %{HTTP_USER_AGENT} "^Rippers 0" [NC,OR]
  134. RewriteCond %{HTTP_USER_AGENT} "^sitecheck\.internetseer\.com" [NC,OR]
  135. RewriteCond %{HTTP_USER_AGENT} "^SiteSnagger" [NC,OR]
  136. RewriteCond %{HTTP_USER_AGENT} "^skygrid" [NC,OR]
  137. RewriteCond %{HTTP_USER_AGENT} "^SmartDownload" [NC,OR]
  138. RewriteCond %{HTTP_USER_AGENT} "^sucker" [NC,OR]
  139. RewriteCond %{HTTP_USER_AGENT} "^SuperBot" [NC,OR]
  140. RewriteCond %{HTTP_USER_AGENT} "^SuperHTTP" [NC,OR]
  141. RewriteCond %{HTTP_USER_AGENT} "^Surfbot" [NC,OR]
  142. RewriteCond %{HTTP_USER_AGENT} "^tAkeOut" [NC,OR]
  143. RewriteCond %{HTTP_USER_AGENT} "^Teleport Pro" [NC,OR]
  144. RewriteCond %{HTTP_USER_AGENT} "^Toata dragostea mea pentru diavola" [NC,OR]
  145. RewriteCond %{HTTP_USER_AGENT} "^turnit" [NC,OR]
  146. RewriteCond %{HTTP_USER_AGENT} "^vikspider" [NC,OR]
  147. RewriteCond %{HTTP_USER_AGENT} "^VoidEYE" [NC,OR]
  148. RewriteCond %{HTTP_USER_AGENT} "^Web Image Collector" [NC,OR]
  149. RewriteCond %{HTTP_USER_AGENT} "^WebAuto" [NC,OR]
  150. RewriteCond %{HTTP_USER_AGENT} "^WebBandit" [NC,OR]
  151. RewriteCond %{HTTP_USER_AGENT} "^WebCopier" [NC,OR]
  152. RewriteCond %{HTTP_USER_AGENT} "^WebFetch" [NC,OR]
  153. RewriteCond %{HTTP_USER_AGENT} "^WebGo IS" [NC,OR]
  154. RewriteCond %{HTTP_USER_AGENT} "^WebLeacher" [NC,OR]
  155. RewriteCond %{HTTP_USER_AGENT} "^WebReaper" [NC,OR]
  156. RewriteCond %{HTTP_USER_AGENT} "^WebSauger" [NC,OR]
  157. RewriteCond %{HTTP_USER_AGENT} "^Website eXtractor" [NC,OR]
  158. RewriteCond %{HTTP_USER_AGENT} "^Website Quester" [NC,OR]
  159. RewriteCond %{HTTP_USER_AGENT} "^WebStripper" [NC,OR]
  160. RewriteCond %{HTTP_USER_AGENT} "^WebWhacker" [NC,OR]
  161. RewriteCond %{HTTP_USER_AGENT} "^WebZIP" [NC,OR]
  162. RewriteCond %{HTTP_USER_AGENT} "^Widow" [NC,OR]
  163. RewriteCond %{HTTP_USER_AGENT} "^WPScan" [NC,OR]
  164. RewriteCond %{HTTP_USER_AGENT} "^WWW\-Mechanize" [NC,OR]
  165. RewriteCond %{HTTP_USER_AGENT} "^WWWOFFLE" [NC,OR]
  166. RewriteCond %{HTTP_USER_AGENT} "^Xaldon WebSpider" [NC,OR]
  167. RewriteCond %{HTTP_USER_AGENT} "^Zeus" [NC,OR]
  168. RewriteCond %{HTTP_USER_AGENT} "^zmeu" [NC,OR]
  169. RewriteCond %{HTTP_USER_AGENT} "360Spider" [NC,OR]
  170. RewriteCond %{HTTP_USER_AGENT} "CazoodleBot" [NC,OR]
  171. RewriteCond %{HTTP_USER_AGENT} "discobot" [NC,OR]
  172. RewriteCond %{HTTP_USER_AGENT} "EasouSpider" [NC,OR]
  173. RewriteCond %{HTTP_USER_AGENT} "ecxi" [NC,OR]
  174. RewriteCond %{HTTP_USER_AGENT} "GT\:\:WWW" [NC,OR]
  175. RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR]
  176. RewriteCond %{HTTP_USER_AGENT} "HTTP\:\:Lite" [NC,OR]
  177. RewriteCond %{HTTP_USER_AGENT} "HTTrack" [NC,OR]
  178. RewriteCond %{HTTP_USER_AGENT} "ia_archiver" [NC,OR]
  179. RewriteCond %{HTTP_USER_AGENT} "id\-search" [NC,OR]
  180. RewriteCond %{HTTP_USER_AGENT} "IDBot" [NC,OR]
  181. RewriteCond %{HTTP_USER_AGENT} "Indy Library" [NC,OR]
  182. RewriteCond %{HTTP_USER_AGENT} "IRLbot" [NC,OR]
  183. RewriteCond %{HTTP_USER_AGENT} "ISC Systems iRc Search 2\.1" [NC,OR]
  184. RewriteCond %{HTTP_USER_AGENT} "LinksCrawler" [NC,OR]
  185. RewriteCond %{HTTP_USER_AGENT} "LinksManager\.com_bot" [NC,OR]
  186. RewriteCond %{HTTP_USER_AGENT} "linkwalker" [NC,OR]
  187. RewriteCond %{HTTP_USER_AGENT} "lwp\-trivial" [NC,OR]
  188. RewriteCond %{HTTP_USER_AGENT} "MFC_Tear_Sample" [NC,OR]
  189. RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [NC,OR]
  190. RewriteCond %{HTTP_USER_AGENT} "Missigua Locator" [NC,OR]
  191. RewriteCond %{HTTP_USER_AGENT} "MJ12bot" [NC,OR]
  192. RewriteCond %{HTTP_USER_AGENT} "panscient\.com" [NC,OR]
  193. RewriteCond %{HTTP_USER_AGENT} "PECL\:\:HTTP" [NC,OR]
  194. RewriteCond %{HTTP_USER_AGENT} "PHPCrawl" [NC,OR]
  195. RewriteCond %{HTTP_USER_AGENT} "PleaseCrawl" [NC,OR]
  196. RewriteCond %{HTTP_USER_AGENT} "SBIder" [NC,OR]
  197. RewriteCond %{HTTP_USER_AGENT} "SearchmetricsBot" [NC,OR]
  198. RewriteCond %{HTTP_USER_AGENT} "Snoopy" [NC,OR]
  199. RewriteCond %{HTTP_USER_AGENT} "Steeler" [NC,OR]
  200. RewriteCond %{HTTP_USER_AGENT} "URI\:\:Fetch" [NC,OR]
  201. RewriteCond %{HTTP_USER_AGENT} "urllib" [NC,OR]
  202. RewriteCond %{HTTP_USER_AGENT} "Web Sucker" [NC,OR]
  203. RewriteCond %{HTTP_USER_AGENT} "webalta" [NC,OR]
  204. RewriteCond %{HTTP_USER_AGENT} "WebCollage" [NC,OR]
  205. RewriteCond %{HTTP_USER_AGENT} "Wells Search II" [NC,OR]
  206. RewriteCond %{HTTP_USER_AGENT} "WEP Search" [NC,OR]
  207. RewriteCond %{HTTP_USER_AGENT} "XoviBot" [NC,OR]
  208. RewriteCond %{HTTP_USER_AGENT} "YisouSpider" [NC,OR]
  209. RewriteCond %{HTTP_USER_AGENT} "zermelo" [NC,OR]
  210. RewriteCond %{HTTP_USER_AGENT} "ZyBorg" [NC,OR]
  211. # End bot locks
  212. # Start blocking on HTTP requests
  213. RewriteCond %{HTTP_REFERER} "^https?://(?:[^/]+\.)?******\.com" [NC,OR]
  214. RewriteCond %{HTTP_REFERER} "^https?://(?:[^/]+\.)?kambasoft\.com" [NC,OR]
  215. RewriteCond %{HTTP_REFERER} "^https?://(?:[^/]+\.)?savetubevideo\.com" [NC]
  216. # End HTTP request blocking
  217. RewriteRule ^.* - [F,L]
  218. https://foxcloud.net/ Data center solutions
  219. Virtual Hosting, VPS, Dedicated servers, Cloud storage, Public Cloud (IaaS)
  220. - To block the Pinterest bot, you need to know the exact name of the bot, you need to know the exact name of the bot.
  221. This information you posted appears to have been copied from somewhere, and you made a mistake when editing.
  222. Where did you find it?
  223. Show your support for everyone affected by the Australian bush fires, and those fighting them
  224. These are our internal instructions.
  225. Thanks for pointing out an error while editing.
  226. https://foxcloud.net/ Data center solutions
  227. Virtual Hosting, VPS, Dedicated servers, Cloud storage, Public Cloud (IaaS)
  228. I was really just interested in knowing if anyone else was experiencing this. I guess not.
  229.  
  230. Looking through the logs there were nearly 216,000 hits from PinterestBot on Monday October 28th. By contrast... Googlebot had just over 8,000 hits on this server on Monday October 28th. Bingbot was at about 5500.
  231.  
  232. I don't guess 216,000 hits is all that much, but the server is no where near full and Google is not even hammering the server that badly.
  233. Is that spread out among all the sites on the server or specific ones? If specific ones, I wonder if it has a loop of some kind happening. I probably would have blocked it at the server level at that point.
  234.  
  235. EDIT: checked some servers and not getting any where near that figure with them.
  236. Looks to be isolated mostly to specific resellers and their resold accounts. I'm really not sure what they have done to make PinterestBot flood their accounts. I'm really not sure what PinterestBot does... Pinterest isn't a search engine. Not really sure what it's doing.
  237.  
  238. I checked another server:
  239.  
  240. PinterestBot: 172750 hits
  241. Googlebot: 7774
  242. Bingbot: 4721
  243.  
  244. Third server:
  245.  
  246. PinterestBot: 75268
  247. Googlebot: 10584
  248. Bingbot: 7271
  249.  
  250. It just seems like PinterestBot is missing some type of limiting feature that Googlebot and Bingbot both have.
  251.  
  252. Little bit surprised that nobody else is seeing this on their servers. But maybe it's just us.
  253.  
  254. What is typically the threshold for determining that a bot is misbehaving?
  255. Pinterest crawler
  256.  
  257. To bring everyone the inspiration to create a life they love, we're creating a database of billions of Pins on Pinterest. In order to protect our users and provide the highest quality content, we use web crawlers to help us identify the data on the pages behind the Pins.
  258.  
  259. These pages contain rich signals that enable us to infer better recommendations, fight spam, and display useful information. To take full advantage of these signals, we regularly fetch, store, and process page content associated with Pins.
  260. How Pinterest accesses your site
  261.  
  262. When a genuine Pinterest crawler visits your website, it will send a valid Pinterest User-Agent and connect from a network operated by Pinterest. In addition to respecting the Robots Exclusion Standard, the Pinterest crawler is configured to automatically rate limit concurrent requests made to your website.
  263.  
  264. We recommend that webmasters avoid hard-coding these network's IP addresses in their site configuration, as the addresses that the crawler uses may change without notice.
  265.  
  266. Pinterest's user agent is:
  267.  
  268. Pinterest/0.2 (+https://www.pinterest.com/bot.html)
  269. Mozilla/5.0 (compatible; Pinterestbot/1.0; +https://www.pinterest.com/bot.html)
  270. Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Pinterestbot/1.0; +https://www.pinterest.com/bot.html)
  271.  
  272. Pinterest's IP is dynamic and thus constantly changing, but will always be in the range of: 54.236.1.XXX.
  273. Taken from: https://help.pinterest.com/en/busine...terest-crawler (as posted by steven99)
  274.  
  275. According to this, the bot should have a rate limiter for concurrent requests. If you do not want to simply block the bot, maybe you could reach out to Pinterest for more information about what is going on. The link provided earlier has a contact button at the bottom.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement