Guest User

Untitled

a guest
Sep 30th, 2020
53
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.95 KB | None | 0 0
  1. Download middleware:
  2.  
  3. def process_request(self, request, spider):
  4. browser_api.open_page(request.url)
  5. browser_api.wait_for_page_to_be_loaded()
  6.  
  7. return HtmlResponse(
  8. browser_api.current_url(),
  9. body=browser_api.get_page_html(),
  10. encoding='utf-8',
  11. request=request
  12. )
  13.  
  14.  
  15. Spider:
  16.  
  17. class SomeSpider(CrawlSpider):
  18. name = "some site"
  19. allowed_domains = ['some_site.com']
  20. start_urls = ['https://some_site.com/']
  21.  
  22. custom_settings = {
  23. 'DOWNLOADER_MIDDLEWARES': {
  24. 'middlewares_custom.MyCustomMiddleware': 900,
  25. }
  26. }
  27.  
  28. allow_urls = #...
  29. deny_urls = # ...
  30.  
  31. rules = (
  32. Rule(LinkExtractor(allow='/something'), callback='parse_item', follow=True),
  33. Rule(LinkExtractor(allow=allow_urls, deny=deny_urls)),
  34. )
  35.  
  36. def parse_item(self, response):
  37. print('I am never called :(')
  38. return []
Add Comment
Please, Sign In to add comment