krlaboratories

use .htaccess to block bad spiders and crawlers

Oct 30th, 2021 (edited)
574
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
PHP 0.87 KB | None | 0 0
  1. #redirect bad bots to one page
  2. RewriteEngine on
  3. RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC,OR]
  4. RewriteCond %{HTTP_USER_AGENT} Twitterbot [NC,OR]
  5. RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
  6. RewriteCond %{HTTP_USER_AGENT} MetaURI [NC,OR]
  7. RewriteCond %{HTTP_USER_AGENT} mediawords [NC,OR]
  8. RewriteCond %{HTTP_USER_AGENT} FlipboardProxy [NC]
  9. RewriteCond %{REQUEST_URI} !\/nocrawler.htm
  10. RewriteRule .* http://yoursite/nocrawler.htm [L]
  11.  
  12. OR
  13.  
  14. #block bad bots with a 403
  15. SetEnvIfNoCase User-Agent "facebookexternalhit" bad_bot
  16. SetEnvIfNoCase User-Agent "Twitterbot" bad_bot
  17. SetEnvIfNoCase User-Agent "Baiduspider" bad_bot
  18. SetEnvIfNoCase User-Agent "MetaURI" bad_bot
  19. SetEnvIfNoCase User-Agent "mediawords" bad_bot
  20. SetEnvIfNoCase User-Agent "FlipboardProxy" bad_bot
  21.  
  22. <Limit GET POST HEAD>
  23.   Order Allow,Deny
  24.   Allow from all
  25.   Deny from env=bad_bot
  26. </Limit>
Advertisement
Add Comment
Please, Sign In to add comment