Guest User

Untitled

a guest
Nov 4th, 2014
250
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.47 KB | None | 0 0
  1. ------ Not working url: http://www.mass.gov/eea/agencies/dfg/der/ ---------
  2.  
  3. 2014-11-04 14:48:45-0500 [scrapy] INFO: Scrapy 0.24.4 started (bot: govcrawl)
  4. 2014-11-04 14:48:45-0500 [scrapy] INFO: Optional features available: ssl, http11, boto, django
  5. 2014-11-04 14:48:45-0500 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'govcrawl.spiders', 'DEPTH_LIMIT': 3, 'SPIDER_MODULES': ['govcrawl.spiders'], 'BOT_NAME': 'govcrawl', 'DOWNLOAD_TIMEOUT': 60, 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0', 'DOWNLOAD_DELAY': 1.5}
  6. 2014-11-04 14:48:45-0500 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
  7. 2014-11-04 14:48:46-0500 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
  8. 2014-11-04 14:48:46-0500 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
  9. 2014-11-04 14:48:46-0500 [scrapy] INFO: Enabled item pipelines: DomainPipeline
  10. 2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Spider opened
  11. 2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
  12. 2014-11-04 14:48:46-0500 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
  13. 2014-11-04 14:48:46-0500 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
  14. 2014-11-04 14:48:46-0500 [govcrawl_main] DEBUG: Crawled (200) <GET http://www.mass.gov/eea/agencies/dfg/der/> (referer: None)
  15. 2014-11-04 14:48:46-0500 [govcrawl_main] INFO: URL: http://www.mass.gov/eea/agencies/dfg/der/ (0) Crawled 1 pages. To Crawl: 0
  16. 2014-11-04 14:48:46-0500 [govcrawl_main] DEBUG: Scraped from <200 http://www.mass.gov/eea/agencies/dfg/der/>
  17. None
  18. 2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Closing spider (finished)
  19. 2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Dumping Scrapy stats:
  20. {'downloader/request_bytes': 274,
  21. 'downloader/request_count': 1,
  22. 'downloader/request_method_count/GET': 1,
  23. 'downloader/response_bytes': 24320,
  24. 'downloader/response_count': 1,
  25. 'downloader/response_status_count/200': 1,
  26. 'finish_reason': 'finished',
  27. 'finish_time': datetime.datetime(2014, 11, 4, 19, 48, 46, 156057),
  28. 'item_scraped_count': 1,
  29. 'log_count/DEBUG': 4,
  30. 'log_count/INFO': 8,
  31. 'pages_crawled': 1,
  32. 'response_received_count': 1,
  33. 'scheduler/dequeued': 1,
  34. 'scheduler/dequeued/memory': 1,
  35. 'scheduler/enqueued': 1,
  36. 'scheduler/enqueued/memory': 1,
  37. 'start_time': datetime.datetime(2014, 11, 4, 19, 48, 46, 61865)}
  38. 2014-11-04 14:48:46-0500 [govcrawl_main] INFO: Spider closed (finished)
  39.  
  40. ------ Working url: http://www.attleboroschools.com/schools/studley_elementary/index.php ---------
  41. 2014-11-04 15:12:23-0500 [scrapy] INFO: Scrapy 0.24.4 started (bot: govcrawl)
  42. 2014-11-04 15:12:23-0500 [scrapy] INFO: Optional features available: ssl, http11, boto, django
  43. 2014-11-04 15:12:23-0500 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'govcrawl.spiders', 'DEPTH_LIMIT': 3, 'SPIDER_MODULES': ['govcrawl.spiders'], 'BOT_NAME': 'govcrawl', 'DOWNLOAD_TIMEOUT': 60, 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0', 'DOWNLOAD_DELAY': 1.5}
  44. 2014-11-04 15:12:23-0500 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
  45. 2014-11-04 15:12:23-0500 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
  46. 2014-11-04 15:12:23-0500 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
  47. 2014-11-04 15:12:24-0500 [scrapy] INFO: Enabled item pipelines: DomainPipeline
  48. 2014-11-04 15:12:24-0500 [govcrawl_main] INFO: Spider opened
  49. 2014-11-04 15:12:24-0500 [govcrawl_main] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
  50. 2014-11-04 15:12:24-0500 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
  51. 2014-11-04 15:12:24-0500 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
  52. 2014-11-04 15:12:24-0500 [govcrawl_main] DEBUG: Crawled (200) <GET http://www.attleboroschools.com/schools/studley_elementary/index.php> (referer: None)
  53. 2014-11-04 15:12:24-0500 [govcrawl_main] INFO: URL: http://www.attleboroschools.com/schools/studley_elementary/index.php (0) Crawled 1 pages. To Crawl: 0
  54. 2014-11-04 15:12:24-0500 [govcrawl_main] DEBUG: Scraped from <200 http://www.attleboroschools.com/schools/studley_elementary/index.php>
  55. None
  56. 2014-11-04 15:12:26-0500 [govcrawl_main] DEBUG: Crawled (200) <GET http://www.attleboroschools.com/schools/studley_elementary/studley_nurse_-_marie_mclaughlin.php> (referer: http://www.attleboroschools.com/schools/studley_elementary/index.php)
  57. 2014-11-04 15:12:26-0500 [govcrawl_main] INFO: URL: http://www.attleboroschools.com/schools/studley_elementary/index.php (0) Crawled 2 pages. To Crawl: 12
  58. 2014-11-04 15:12:26-0500 [govcrawl_main] DEBUG: Scraped from <200 http://www.attleboroschools.com/schools/studley_elementary/studley_nurse_-_marie_mclaughlin.php>
  59. None
  60. 2014-11-04 15:12:26-0500 [govcrawl_main] DEBUG: Filtered duplicate request: <GET http://www.attleboroschools.com/schools/studley_elementary/index.php> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
  61. 2014-11-04 15:12:27-0500 [govcrawl_main] DEBUG: Crawled (200) <GET http://www.attleboroschools.com/schools/studley_elementary/studley_student_services.php> (referer: http://www.attleboroschools.com/schools/studley_elementary/index.php)
  62. 2014-11-04 15:12:27-0500 [govcrawl_main] INFO: URL: http://www.attleboroschools.com/schools/studley_elementary/index.php (0) Crawled 3 pages. To Crawl: 11
  63. 2014-11-04 15:12:27-0500 [govcrawl_main] DEBUG: Scraped from <200 http://www.attleboroschools.com/schools/studley_elementary/studley_student_services.php>
  64. None
  65. 2014-11-04 15:12:28-0500 [govcrawl_main] DEBUG: Crawled (200) <GET http://www.attleboroschools.com/schools/studley_elementary/studley_educators.php> (referer: http://www.attleboroschools.com/schools/studley_elementary/index.php)
  66. 2014-11-04 15:12:28-0500 [govcrawl_main] INFO: URL: http://www.attleboroschools.com/schools/studley_elementary/index.php (0) Crawled 4 pages. To Crawl: 10
  67. 2014-11-04 15:12:28-0500 [govcrawl_main] DEBUG: Scraped from <200 http://www.attleboroschools.com/schools/studley_elementary/studley_educators.php>
  68. None
  69. ^C2014-11-04 15:12:29-0500 [scrapy] INFO: Received SIGINT, shutting down gracefully. Send again to force
  70. 2014-11-04 15:12:29-0500 [govcrawl_main] INFO: Closing spider (shutdown)
  71. 2014-11-04 15:12:30-0500 [govcrawl_main] DEBUG: Crawled (200) <GET http://www.attleboroschools.com/schools/studley_elementary/studley_families.php> (referer: http://www.attleboroschools.com/schools/studley_elementary/index.php)
  72. 2014-11-04 15:12:30-0500 [govcrawl_main] INFO: URL: http://www.attleboroschools.com/schools/studley_elementary/index.php (0) Crawled 5 pages. To Crawl: 9
  73. 2014-11-04 15:12:30-0500 [govcrawl_main] DEBUG: Scraped from <200 http://www.attleboroschools.com/schools/studley_elementary/studley_families.php>
  74. None
  75. etc...
Advertisement
Add Comment
Please, Sign In to add comment