Guest User

Untitled

a guest
Dec 17th, 2017
94
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.27 KB | None | 0 0
  1. 2017-12-17 14:20:27 [scrapy.extensions.telnet] DEBUG: Telnet console
  2. listening on 127.0.0.1:6025
  3. 2017-12-17 14:21:27 [scrapy.extensions.logstats] INFO: Crawled 0 pages
  4. (at
  5. 0 pages/min), scraped 0 items (at 0 items/min)
  6. 2017-12-17 14:22:27 [scrapy.extensions.logstats] INFO: Crawled 0 pages
  7. (at
  8. 0 pages/min), scraped 0 items (at 0 items/min)
  9. 2017-12-17 14:22:38 [scrapy.downloadermiddlewares.retry] DEBUG:
  10. Retrying
  11. <GET https://fr.example.com/robots.txt> (failed 1 times): TCP
  12. connection
  13. timed out: 110: Connection timed out.
  14.  
  15. import scrapy
  16. import itertools
  17.  
  18. class SomeSpider(scrapy.Spider):
  19. name = 'some'
  20. allowed_domains = ['https://fr.example.com']
  21. def start_requests(self):
  22. categories = [ 'thing1', 'thing2', 'thing3',]
  23. base = "https://fr.example.com/things?t={category}&p={index}"
  24.  
  25. for category, index in itertools.product(categories, range(1, 11)):
  26. yield scrapy.Request(base.format(category=category, index=index))
  27.  
  28. def parse(self, response):
  29. response.selector.remove_namespaces()
  30. info1 = response.css("span.info1").extract()
  31. info2 = response.css("span.info2").extract()
  32.  
  33. for item in zip(info1, info2):
  34. scraped_info = {
  35. 'info1': item[0],
  36. 'info2': item[1]
  37. }
  38.  
  39. yield scraped_info
Add Comment
Please, Sign In to add comment