SHARE
TWEET

Untitled

a guest Jun 25th, 2019 66 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. import requests
  2. from bs4 import BeautifulSoup as bs
  3.  
  4.  
  5. class Spider(object):
  6.     def __init__(self):
  7.         self.url = 'https://www.murrengan.ru/murrs/'
  8.         self.headers = {
  9.             'accept': '*/*',
  10.             'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
  11.         }
  12.         self.get_html()
  13.  
  14.     def get_html(self):
  15.         with requests.Session() as session:
  16.             response = session.get('https://www.murrengan.ru/murrs/', headers=self.headers)
  17.             if response.status_code == 200:
  18.                 return response.content
  19.             else:
  20.                 return f"Error: {response.status_code}"
  21.  
  22.     def parse_html(self, html):
  23.         usr = []
  24.         soup = bs(html, 'lxml')
  25.         divs = soup.find_all('div', {'class': 'murr-card'})
  26.         for div in divs:
  27.             author = div.find('a', {'class': 'profile__name'}).text
  28.             usr.append(author)
  29.         return usr
  30.  
  31.  
  32. if __name__ == '__main__':
  33.     obj = Spider()
  34.     html = obj.get_html()
  35.     print(obj.parse_html(html))
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top