Advertisement
Guest User

Untitled

a guest
Feb 22nd, 2024
63
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Bash 0.69 KB | None | 0 0
  1. #!/bin/bash
  2.  
  3. USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
  4. SAVE_HOST="www.vice.com"
  5. WARC_NAME="www.vice.com-panicgrab-20220222"
  6. AUTHOR_SLUG="author-name"  # <<< CHANGE THIS
  7.  
  8. for X in {1..42}; do  # <<< CHANGE THIS range from 1..42 to however many pages the author has
  9.   wget \
  10.   -e robots=off -r -l 1 --page-requisites --accept-regex=article -D www.vice.com,video-images.vice.com -H \
  11.   --waitretry 5 --timeout 60 --tries 5 --wait 1 -k \
  12.   --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME.$X" \
  13.   -U "$USER_AGENT" https://www.vice.com/en/contributor/$AUTHOR_SLUG?page=$X
  14. done
  15.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement