Guest User

ripping pics from asiachan

a guest
Aug 3rd, 2017
109
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.55 KB | None | 0 0
  1. dependencies:
  2.  
  3. conemu (or linux)
  4. wget
  5. fnr
  6. aria2c
  7.  
  8. # scrape asiachan base url
  9.  
  10. FOR /l %i in (1,1,30) DO wget -v -o "C:\Users\HTPC\Desktop\TESTFOLDER\log.txt" -e robots=off -r --spider --level=1 "http://kpop.asiachan.com/IU?d=2&p=%i" --output-file="C:\Users\HTPC\Desktop\TESTFOLDER\URLS%i.csv"
  11.  
  12.  
  13. # join logs into one txt file
  14.  
  15. copy "C:\Users\HTPC\Desktop\TESTFOLDER\*.csv" "C:\Users\HTPC\Desktop\TESTFOLDER\URLS1.txt"
  16.  
  17. * delete "C:\Users\HTPC\Desktop\TESTFOLDER\*.csv"
  18.  
  19. # delete .csv leftovers
  20.  
  21. del "C:\Users\HTPC\Desktop\TESTFOLDER\*.csv"
  22.  
  23.  
  24. # filter page url
  25.  
  26. grep "C:\Users\HTPC\Desktop\TESTFOLDER\URLS1.txt" -e http://kpop.asiachan.com/[0-9] > "C:\Users\HTPC\Desktop\TESTFOLDER\URLS2.txt"
  27.  
  28.  
  29. # remove date stamp
  30.  
  31. sed -e 's/--.......................//g' "C:\Users\HTPC\Desktop\TESTFOLDER\URLS2.txt" > "C:\Users\HTPC\Desktop\TESTFOLDER\URLS3.txt"
  32.  
  33.  
  34. # remove duplicates
  35.  
  36. sed -n 'g;n;p' "C:\Users\HTPC\Desktop\TESTFOLDER\URLS3.txt" > "C:\Users\HTPC\Desktop\TESTFOLDER\URLS4.txt"
  37.  
  38.  
  39. # replace page url with image url
  40.  
  41.  
  42. "C:\Users\HTPC\Desktop\Utilities\find and replace.exe" --cl --dir "C:\Users\HTPC\Desktop\testdir" --fileMask "*.txt" --excludeFileMask "*.dll, *.exe" --includeSubDirectories --find "http://kpop.asiachan.com/" --replace "http://static.asiachan.com/IU.full."
  43.  
  44.  
  45. # add .jpg suffix
  46.  
  47. sed -i 's/$/.jpg/' "C:\Users\HTPC\Desktop\TESTFOLDER\URLS5.txt"
  48.  
  49.  
  50. # download from url
  51.  
  52. aria2c --file-allocation=none -c -x 10 -s 10 --input-file="C:\Users\HTPC\Desktop\TESTFOLDER\URLS5.txt" --dir="C:\Users\HTPC\Desktop\iu3"
Advertisement
Add Comment
Please, Sign In to add comment