Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- The script below will download URLhaus URL list and create a txt you can import into OpenDNS or other content blocking applications. The script keeps the previous history of domains so the next time you run it it creates a differential of only the new hosts. I have been running this for 5 months and has been a huge benefit with blocking in OpenDNS.
- echo off
- :: Requires Cygwin utils gawk grep cut diff sed sort wget
- :: https://cygwin.com/install.html
- :: Be aware that Windows sort will not work, you can edit and add the path to cygwin sort
- ::
- :: This script downloads the URL hause list of URLs, compares with your previous download to only create a file with differences and exclude any domains you want whitelisted.
- :: To whitelist a domain, go to the WHITELIST DOMAIN comment below
- :: at the end of running this, you will have a file called URLHauseDomains.Currentdate.Time.txt The first time you run it, you will have all domains listed in URLhause, don't delete uniqdomainsold.txt
- ::
- echo .
- for /f "tokens=1,2" %%u in ('date /t') do set d=%%v
- for /f "tokens=1" %%u in ('echo %time%') do set t=%%u
- if "%t:~1,1%"==":" set t=0%t%
- set timestr=%d:~6,4%%d:~0,2%%d:~3,2%_%t:~0,2%%t:~3,2%
- echo Downloading new URLHaus List
- wget -O urlhauscurrent.txt https://urlhaus.abuse.ch/downloads/csv/
- echo .
- echo Cleaning up file by removing IP, duplicates etc.
- cut -d, -f3 urlhauscurrent.txt >refinded.txt
- gawk -F/ '{print $3}' <refinded.txt>domains.txt
- grep -v '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}$' domains.txt>onlydomains.txt
- grep -v : onlydomains.txt >noipdomains.txt
- sort -u <noipdomains.txt>uniqd.txt
- Echo .
- Echo .
- echo Removing any domains you want to white list
- echo .
- :: WHITELIST DOMAIN
- :: Add a domain you want white listed in the grep line below
- grep -v ".amazon.com\|.pardot.com\|.dropbox.com\|.google.com\|.amazonaws.com\|.microsoft.com\|.sharepoint.com\|.mf\|.ms\|.1drv.com\|.ac.th\|.box.com\|.boxcloud.com\|.cudasvc.com\|.dropboxusercontent.com\|.github.com\|.githubusercontent.com\|.go.th\|.googleapis.com\|.googleusercontent.com\|.jquery.com\|.live.com\|.naver.com\|.onedrive.com\|.outlook.com\|pardot.com\|.salesforce.com\|.windows.net\|yimg.com" uniqd.txt >uniqdomainscurrent.txt
- diff uniqdomainscurrent.txt uniqdomainsold.txt | grep "<" >newdomains.txt
- cut -d' ' -f2 newdomains.txt >newdomainscut.txt
- Echo .
- Echo . Stripping www from domains
- sed 's/^www\.//g' <newdomainscut.txt>URLHausDomains.%timestr%.txt
- copy uniqdomainscurrent.txt uniqdomainsold.txt
- echo .
- echo .
- echo Check URLHausDomains.%timestr%.txt to see if any new domains exist since you last ran an update
- :: First time run all of the domains will be in the uniqdomainscurrent.txt
- :: Going forward, the differential from the last time you ran the script will be in the URLHausDomains.date file
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement