Advertisement
Guest User

Untitled

a guest
Aug 19th, 2017
69
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.81 KB | None | 0 0
  1. class bookspider(CrawlSpider):
  2. name = 'booking_spider'
  3.  
  4. start_url = [
  5. 'https://www.tui.ru/ToursSearch//ToursSearch/Europe/Bulgaria.aspx']
  6. allowed_domains = ['https://www.tui.ru/ToursSearch']
  7.  
  8. rules = (
  9. Rule(LinkExtractor(allow=('/ToursSearch/Europe/Bulgaria.aspx')), callback='parse_item'),)
  10.  
  11. class TuiLoader(XPathItemLoader):
  12. default_output_processor = TakeFirst()
  13.  
  14. def parse_item(self, response):
  15.  
  16. item = TuiItem()
  17. item['cost'] = response.xpath("//*[@id='resultsHeader']/text()").extract()
  18. item['name'] = response.xpath("//*[@id='resultsHeader']/text()").extract()
  19. item['nights'] = response.xpath("//*[@id='resultsHeader']/text()").extract()
  20. item['country'] = response.xpath("//*[@id='resultsHeader']/text()").extract()
  21. return item
  22.  
  23. class TuiItem(scrapy.Item):
  24. url = scrapy.Field()
  25. name = scrapy.Field()
  26. costs = scrapy.Field()
  27. nights = scrapy.Field()
  28. country = scrapy.Field()
  29. pass
  30.  
  31. /home/garcia/tutorial/tutorial/spiders/booking_spider.py:19: ScrapyDeprecationWarning: tutorial.spiders.booking_spider.TuiLoader inherits from deprecated class scrapy.loader.XPathItemLoader, please inherit from scrapy.loader.ItemLoader. (warning only on first subclass, there may be others)
  32. class TuiLoader(XPathItemLoader):
  33. 2017-08-20 02:38:29 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: tutorial)
  34. 2017-08-20 02:38:29 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
  35. 2017-08-20 02:38:29 [scrapy.middleware] INFO: Enabled extensions:
  36. ['scrapy.extensions.corestats.CoreStats',
  37. 'scrapy.extensions.telnet.TelnetConsole',
  38. 'scrapy.extensions.memusage.MemoryUsage',
  39. 'scrapy.extensions.logstats.LogStats']
  40. 2017-08-20 02:38:29 [scrapy.middleware] INFO: Enabled downloader middlewares:
  41. ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
  42. 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
  43. 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
  44. 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
  45. 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
  46. 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
  47. 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
  48. 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
  49. 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
  50. 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
  51. 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
  52. 'scrapy.downloadermiddlewares.stats.DownloaderStats']
  53. 2017-08-20 02:38:29 [scrapy.middleware] INFO: Enabled spider middlewares:
  54. ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
  55. 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
  56. 'scrapy.spidermiddlewares.referer.RefererMiddleware',
  57. 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
  58. 'scrapy.spidermiddlewares.depth.DepthMiddleware']
  59. 2017-08-20 02:38:29 [scrapy.middleware] INFO: Enabled item pipelines:
  60. []
  61. 2017-08-20 02:38:29 [scrapy.core.engine] INFO: Spider opened
  62. 2017-08-20 02:38:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
  63. 2017-08-20 02:38:29 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
  64. 2017-08-20 02:38:29 [scrapy.core.engine] INFO: Closing spider (finished)
  65. 2017-08-20 02:38:29 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
  66. {'finish_reason': 'finished',
  67. 'finish_time': datetime.datetime(2017, 8, 19, 23, 38, 29, 209335),
  68. 'log_count/DEBUG': 1,
  69. 'log_count/INFO': 7,
  70. 'memusage/max': 49655808,
  71. 'memusage/startup': 49655808,
  72. 'start_time': datetime.datetime(2017, 8, 19, 23, 38, 29, 204191)}
  73. 2017-08-20 02:38:29 [scrapy.core.engine] INFO: Spider closed (finished)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement