Advertisement
DarthInvader

URLHaus URL to domain script and tracker

Mar 8th, 2019
2,941
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.77 KB | None | 0 0
  1. The script below will download URLhaus URL list and create a txt you can import into OpenDNS or other content blocking applications. The script keeps the previous history of domains so the next time you run it it creates a differential of only the new hosts. I have been running this for 5 months and has been a huge benefit with blocking in OpenDNS.
  2.  
  3. echo off
  4. :: Requires Cygwin utils gawk grep cut diff sed sort wget
  5. :: https://cygwin.com/install.html
  6. :: Be aware that Windows sort will not work, you can edit and add the path to cygwin sort
  7. ::
  8. :: This script downloads the URL hause list of URLs, compares with your previous download to only create a file with differences and exclude any domains you want whitelisted.
  9. :: To whitelist a domain, go to the WHITELIST DOMAIN comment below
  10. :: at the end of running this, you will have a file called URLHauseDomains.Currentdate.Time.txt The first time you run it, you will have all domains listed in URLhause, don't delete uniqdomainsold.txt
  11. ::
  12. echo .
  13. for /f "tokens=1,2" %%u in ('date /t') do set d=%%v
  14. for /f "tokens=1" %%u in ('echo %time%') do set t=%%u
  15. if "%t:~1,1%"==":" set t=0%t%
  16. set timestr=%d:~6,4%%d:~0,2%%d:~3,2%_%t:~0,2%%t:~3,2%
  17.  
  18. echo Downloading new URLHaus List
  19.  
  20. wget -O urlhauscurrent.txt https://urlhaus.abuse.ch/downloads/csv/
  21. echo .
  22. echo Cleaning up file by removing IP, duplicates etc.
  23. cut -d, -f3 urlhauscurrent.txt >refinded.txt
  24. gawk -F/ '{print $3}' <refinded.txt>domains.txt
  25. grep -v '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}$' domains.txt>onlydomains.txt
  26. grep -v : onlydomains.txt >noipdomains.txt
  27. sort -u <noipdomains.txt>uniqd.txt
  28. Echo .
  29. Echo .
  30. echo Removing any domains you want to white list
  31. echo .
  32. :: WHITELIST DOMAIN
  33. :: Add a domain you want white listed in the grep line below
  34. grep -v ".amazon.com\|.pardot.com\|.dropbox.com\|.google.com\|.amazonaws.com\|.microsoft.com\|.sharepoint.com\|.mf\|.ms\|.1drv.com\|.ac.th\|.box.com\|.boxcloud.com\|.cudasvc.com\|.dropboxusercontent.com\|.github.com\|.githubusercontent.com\|.go.th\|.googleapis.com\|.googleusercontent.com\|.jquery.com\|.live.com\|.naver.com\|.onedrive.com\|.outlook.com\|pardot.com\|.salesforce.com\|.windows.net\|yimg.com" uniqd.txt >uniqdomainscurrent.txt
  35. diff uniqdomainscurrent.txt uniqdomainsold.txt | grep "<" >newdomains.txt
  36. cut -d' ' -f2 newdomains.txt >newdomainscut.txt
  37. Echo .
  38. Echo . Stripping www from domains
  39. sed 's/^www\.//g' <newdomainscut.txt>URLHausDomains.%timestr%.txt
  40. copy uniqdomainscurrent.txt uniqdomainsold.txt
  41. echo .
  42. echo .
  43. echo Check URLHausDomains.%timestr%.txt to see if any new domains exist since you last ran an update
  44. :: First time run all of the domains will be in the uniqdomainscurrent.txt
  45. :: Going forward, the differential from the last time you ran the script will be in the URLHausDomains.date file
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement