daily pastebin goal
41%
SHARE
TWEET

WebsiteCrawlerUpdate

Mr_whitespace Aug 11th, 2018 59 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. from bs4 import BeautifulSoup
  2. import requests
  3. import re
  4. import sys
  5.  
  6.  
  7. def writetoDoc(docname, data, urlname):
  8.     document = open(docname, 'w+')
  9.     document.write(str(data))
  10.     document.close()
  11.  
  12. def Crawlweb(url, tags):
  13.     website = requests.get(url)
  14.     content = website.text
  15.  
  16.     soup = BeautifulSoup(content, 'html.parser')
  17.     final = soup.find_all(tags)
  18.     print(final)
  19.     write = input('Do you want to write parsed data to a file?')
  20.     if write.lower() == 'yes' or 'y':
  21.         name = input('Enter name for new file')
  22.         writetoDoc(name, final, url)
  23.         print('Data has been written to file')
  24.         sleep(1)
  25.         print('Quitting...')
  26.         sys.exit()
  27.     elif write.lower() == 'no':
  28.         print('Quitting...')
  29.         sys.exit()
  30.     else:
  31.         print('Error')
  32.  
  33.  
  34. url = input('Enter the url of the website that you want data from')
  35. if url.startswith('https://') == False:
  36.     url = 'https://www.' + url
  37.     print(url)
  38. else:
  39.     pass
  40. tag = input('What tag do you want info from?')
  41. Crawlweb(url, tag)
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top