Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ## Outputs saved at https://bit.ly/qscrawl_samplop ##
- #### [copy to own drive to interact with filters] ##
- #### [json files saved as rows in first sheet] ##
- from queue_scrawler_reqs import * ## download or paste from https://pastebin.com/TBtYja5D
- ######################## FIRST TIME ########################
- setGlobals({'starterUrl': 'https://en.wikipedia.org/wiki/Special:Random'})
- nextUrl = get_next_fromScrawlQ()
- while nextUrl: nextUrl = logScrape(scrapeUrl(nextUrl))
- saveScrawlSess('qScrawl1.csv', 'vScrawl1.json')
- ############################################################
- ######################## NEXT TIME ########################
- loadScrawlSess('qScrawl1.csv', 'vScrawl1.json', 'q<--page_limit_exceeded')
- nextUrl = get_next_fromScrawlQ()
- while nextUrl: nextUrl = logScrape(scrapeUrl(nextUrl))
- saveScrawlSess('qScrawl2.csv', 'vScrawl2.json')
- ###########################################################
Add Comment
Please, Sign In to add comment