Advertisement
overloop

notes on krautchan.net grabbing

Apr 1st, 2013
172
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Bash 0.41 KB | None | 0 0
  1. grep -Poh "(?<=src=\")/[^\"]*" *.html | sort | uniq
  2. grep -Poh "(?<=href=\")[^\"]*" *.html | sort | uniq | grep "thread.*\.html$"
  3. sed -i "s,src=/thumbnails,src=thumbnails,g" *.html
  4.  
  5. wget http://krautchan.net/css/style.css
  6. sed -i "s,href=\"/css/style.css,href=\"style.css,g" *.html
  7. sed -i "s,href=\"/int/thread-,href=\"thread-,g" *.html
  8.  
  9. sed -i "s,/int/\([0-9]*\).html,\1.html,g" {1..20}.html
  10.  
  11. http://krautchan.net
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement