Advertisement
apl-mhd

author spider

Jun 25th, 2019
267
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.77 KB | None | 0 0
  1.  name = 'author'
  2.  
  3.     start_urls = ['http://quotes.toscrape.com/']
  4.  
  5.     def parse(self, response):
  6.         # follow links to author pages
  7.         for href in response.css('.author + a::attr(href)'):
  8.             yield response.follow(href, self.parse_author)
  9.  
  10.         # follow pagination links
  11.         for href in response.css('li.next a::attr(href)'):
  12.             yield response.follow(href, self.parse)
  13.  
  14.     def parse_author(self, response):
  15.         def extract_with_css(query):
  16.             return response.css(query).get(default='').strip()
  17.  
  18.         yield {
  19.             'name': extract_with_css('h3.author-title::text'),
  20.             'birthdate': extract_with_css('.author-born-date::text'),
  21.             'bio': extract_with_css('.author-description::text'),
  22.         }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement