Guest User

scrapytest1

a guest
Jul 26th, 2017
120
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 16.39 KB | None | 0 0
  1. 2017-07-26 09:38:37 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: Thesis)
  2. 2017-07-26 09:38:37 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'Thesis.spiders', 'CONCURRENT_REQUESTS': 200, 'SPIDER_MODULES': ['Thesis.spiders'], 'BOT_NAME': 'Thesis', 'CONCURRENT_ITEMS': 400, 'COOKIES_ENABLED': False, 'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36', 'DOWNLOAD_DELAY': 4}
  3. 2017-07-26 09:38:37 [scrapy.middleware] INFO: Enabled extensions:
  4. ['scrapy.extensions.logstats.LogStats',
  5. 'scrapy.extensions.telnet.TelnetConsole',
  6. 'scrapy.extensions.corestats.CoreStats']
  7. 2017-07-26 09:38:40 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session {"capabilities": {"alwaysMatch": {"platform": "ANY", "browserName": "phantomjs", "version": "", "javascriptEnabled": true}, "firstMatch": []}, "desiredCapabilities": {"platform": "ANY", "browserName": "phantomjs", "version": "", "javascriptEnabled": true}}
  8. 2017-07-26 09:38:40 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  9. 2017-07-26 09:38:40 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/window/current/size {"width": 1120, "windowHandle": "current", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "height": 550}
  10. 2017-07-26 09:38:40 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  11. 2017-07-26 09:38:40 [scrapy.middleware] INFO: Enabled downloader middlewares:
  12. ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
  13. 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
  14. 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
  15. 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
  16. 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
  17. 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
  18. 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
  19. 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
  20. 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
  21. 'scrapy.downloadermiddlewares.stats.DownloaderStats']
  22. 2017-07-26 09:38:40 [scrapy.middleware] INFO: Enabled spider middlewares:
  23. ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
  24. 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
  25. 'scrapy.spidermiddlewares.referer.RefererMiddleware',
  26. 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
  27. 'scrapy.spidermiddlewares.depth.DepthMiddleware']
  28. 2017-07-26 09:38:40 [scrapy.middleware] INFO: Enabled item pipelines:
  29. ['Thesis.pipelines.PcworldPipeline']
  30. 2017-07-26 09:38:40 [scrapy.core.engine] INFO: Spider opened
  31. 2017-07-26 09:38:40 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
  32. 2017-07-26 09:38:40 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
  33. 2017-07-26 09:38:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.pcworld.com/search?query=heartbleed> (referer: None)
  34. 2017-07-26 09:38:43 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/url {"url": "http://www.pcworld.com/search?query=heartbleed", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9"}
  35. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  36. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element {"using": "class name", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "value": "excerpt-text"}
  37. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  38. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/elements {"using": "xpath", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "value": "//div[@class=\"excerpt-text\"]/h3/a"}
  39. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  40. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740524/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740524"}
  41. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  42. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740525/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740525"}
  43. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  44. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740526/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740526"}
  45. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  46. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740527/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740527"}
  47. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  48. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740528/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740528"}
  49. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  50. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740529/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740529"}
  51. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  52. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740530/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740530"}
  53. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  54. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740531/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740531"}
  55. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  56. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740532/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740532"}
  57. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  58. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740533/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740533"}
  59. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  60. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740534/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740534"}
  61. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  62. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/elements {"using": "xpath", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "value": "//a[@rel='next']"}
  63. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  64. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element {"using": "xpath", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "value": "//a[@rel='next']"}
  65. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  66. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054740535/attribute/href {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "href", "id": ":wdc:1501054740535"}
  67. 2017-07-26 09:39:00 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  68. 2017-07-26 09:39:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.pcworld.com/article/2146081/healthcare-gov-users-required-to-change-passwords-heartbleed.html> (referer: http://www.pcworld.com/search?query=heartbleed)
  69. 2017-07-26 09:39:02 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/url {"url": "http://www.pcworld.com/article/2146081/healthcare-gov-users-required-to-change-passwords-heartbleed.html", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9"}
  70. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  71. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element {"using": "xpath", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "value": "//h1[@itemprop='headline']"}
  72. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  73. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054773836/attribute/innerHTML {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "innerHTML", "id": ":wdc:1501054773836"}
  74. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  75. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element {"using": "xpath", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "value": "//meta[@name='date']"}
  76. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  77. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054773837/attribute/content {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "content", "id": ":wdc:1501054773837"}
  78. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  79. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/elements {"using": "xpath", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "value": "//div[contains(@itemprop, 'articleBody')]//p"}
  80. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  81. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054773838/attribute/innerHTML {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "innerHTML", "id": ":wdc:1501054773838"}
  82. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  83. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054773839/attribute/innerHTML {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "innerHTML", "id": ":wdc:1501054773839"}
  84. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  85. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054773840/attribute/innerHTML {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "innerHTML", "id": ":wdc:1501054773840"}
  86. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  87. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054773841/attribute/innerHTML {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "innerHTML", "id": ":wdc:1501054773841"}
  88. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  89. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054773842/attribute/innerHTML {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "innerHTML", "id": ":wdc:1501054773842"}
  90. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  91. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/element/:wdc:1501054773843/attribute/innerHTML {"sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9", "name": "innerHTML", "id": ":wdc:1501054773843"}
  92. 2017-07-26 09:39:33 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
  93. 2017-07-26 09:39:33 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.pcworld.com/article/2146081/healthcare-gov-users-required-to-change-passwords-heartbleed.html>
  94.  
  95. {'Article': u'If you have an account with HealthCare.gov, you can expect to change your password the next time you log in. And you can thank Heartbleed for it. According to the website, all HeathCare.gov users will be prompted to change their passwords the next time they log into the site. According to the site,\xa0"HealthCare.gov uses many layers of protections to secure your information," and theres no sign that any Healthcare.gov user information has been compromised, so this is mainly a precautionary measure. The Associated Press notes that the US Government is reviewing al of its sites to see if theyre vulnerable to the Heartbleed bug, so its possible that users of other government sites may have to change their passwords in the not-too-distant future. HealthCare.gov recommends using a password thats unique to your Healthcare.gov account. Some password managers, such as 1Password,\xa0can generate and store unique passwords that you dont need to memorize. But you dont need a password manager to devise stronger passwords: There are some tricks you can employ to create strong passwords that you can actually remember. See Alex Wawros guide to creating stronger passwords without losing your mind\xa0for one approach. And visit HealthCare.gov for more on that sites mandatory password change requirement. This story, "Blame Heartbleed: HealthCare.gov requires users to change their passwords" was originally published by TechHive.',
  96. 'Datum': u'2014-04-19',
  97. 'Original_URL': 'http://www.pcworld.com/article/2146081/healthcare-gov-users-required-to-change-passwords-heartbleed.html',
  98. 'Ueberschrift': 'Blame Heartbleed: HealthCare.gov requires users to change their passwords'}
  99. 2017-07-26 09:39:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.pcworld.com/search?query=heartbleed&start=10> (referer: http://www.pcworld.com/search?query=heartbleed)
  100. 2017-07-26 09:39:35 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52986/wd/hub/session/71437af0-71d5-11e7-83de-29eb7aea97e9/url {"url": "http://www.pcworld.com/search?query=heartbleed&start=10", "sessionId": "71437af0-71d5-11e7-83de-29eb7aea97e9"}
Add Comment
Please, Sign In to add comment