Advertisement
Guest User

Nutch 1.7 hadoop.log

a guest
Oct 24th, 2013
379
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Java 24.99 KB | None | 0 0
  1. 2013-10-22 17:36:32.667 java[671:1203] Unable to load realm info from SCDynamicStore
  2. crawl started in: crawl-20131022173632
  3. rootUrlDir = urls
  4. threads = 10
  5. depth = 3
  6. solrUrl=http://localhost:8983/solr/
  7. topN = 5
  8. Injector: starting at 2013-10-22 17:36:32
  9. Injector: crawlDb: crawl-20131022173632/crawldb
  10. Injector: urlDir: urls
  11. Injector: Converting injected urls to crawl db entries.
  12. Injector: total number of urls rejected by filters: 0
  13. Injector: total number of urls injected after normalization and filtering: 1
  14. Injector: Merging injected urls into crawl db.
  15. Injector: finished at 2013-10-22 17:36:35, elapsed: 00:00:02
  16. Generator: starting at 2013-10-22 17:36:35
  17. Generator: Selecting best-scoring urls due for fetch.
  18. Generator: filtering: true
  19. Generator: normalizing: true
  20. Generator: topN: 5
  21. Generator: jobtracker is 'local', generating exactly one partition.
  22. Generator: Partitioning selected urls for politeness.
  23. Generator: segment: crawl-20131022173632/segments/20131022173637
  24. Generator: finished at 2013-10-22 17:36:38, elapsed: 00:00:03
  25. Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
  26. Fetcher: starting at 2013-10-22 17:36:38
  27. Fetcher: segment: crawl-20131022173632/segments/20131022173637
  28. Using queue mode : byHost
  29. Fetcher: threads: 10
  30. Fetcher: time-out divisor: 2
  31. QueueFeeder finished: total 1 records + hit by time limit :0
  32. Using queue mode : byHost
  33. Using queue mode : byHost
  34. Using queue mode : byHost
  35. -finishing thread FetcherThread, activeThreads=1
  36. fetching http://rockies.edu/ (queue crawl delay=5000ms)
  37. -finishing thread FetcherThread, activeThreads=1
  38. Using queue mode : byHost
  39. -finishing thread FetcherThread, activeThreads=1
  40. Using queue mode : byHost
  41. Using queue mode : byHost
  42. -finishing thread FetcherThread, activeThreads=1
  43. -finishing thread FetcherThread, activeThreads=1
  44. Using queue mode : byHost
  45. -finishing thread FetcherThread, activeThreads=1
  46. Using queue mode : byHost
  47. -finishing thread FetcherThread, activeThreads=1
  48. Using queue mode : byHost
  49. -finishing thread FetcherThread, activeThreads=1
  50. Using queue mode : byHost
  51. Fetcher: throughput threshold: -1
  52. Fetcher: throughput threshold retries: 5
  53. -finishing thread FetcherThread, activeThreads=1
  54. -finishing thread FetcherThread, activeThreads=0
  55. -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
  56. -activeThreads=0
  57. Fetcher: finished at 2013-10-22 17:36:40, elapsed: 00:00:02
  58. ParseSegment: starting at 2013-10-22 17:36:40
  59. ParseSegment: segment: crawl-20131022173632/segments/20131022173637
  60. Parsed (12ms):http://rockies.edu/
  61. ParseSegment: finished at 2013-10-22 17:36:41, elapsed: 00:00:01
  62. CrawlDb update: starting at 2013-10-22 17:36:41
  63. CrawlDb update: db: crawl-20131022173632/crawldb
  64. CrawlDb update: segments: [crawl-20131022173632/segments/20131022173637]
  65. CrawlDb update: additions allowed: true
  66. CrawlDb update: URL normalizing: true
  67. CrawlDb update: URL filtering: true
  68. CrawlDb update: 404 purging: false
  69. CrawlDb update: Merging segment data into db.
  70. CrawlDb update: finished at 2013-10-22 17:36:42, elapsed: 00:00:01
  71. Generator: starting at 2013-10-22 17:36:42
  72. Generator: Selecting best-scoring urls due for fetch.
  73. Generator: filtering: true
  74. Generator: normalizing: true
  75. Generator: topN: 5
  76. Generator: jobtracker is 'local', generating exactly one partition.
  77. Generator: Partitioning selected urls for politeness.
  78. Generator: segment: crawl-20131022173632/segments/20131022173644
  79. Generator: finished at 2013-10-22 17:36:45, elapsed: 00:00:03
  80. Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
  81. Fetcher: starting at 2013-10-22 17:36:45
  82. Fetcher: segment: crawl-20131022173632/segments/20131022173644
  83. Using queue mode : byHost
  84. Fetcher: threads: 10
  85. Fetcher: time-out divisor: 2
  86. QueueFeeder finished: total 5 records + hit by time limit :0
  87. Using queue mode : byHost
  88. fetching http://rockies.edu/RequestMoreInfo.htm (queue crawl delay=5000ms)
  89. Using queue mode : byHost
  90. Using queue mode : byHost
  91. Using queue mode : byHost
  92. Using queue mode : byHost
  93. Using queue mode : byHost
  94. Using queue mode : byHost
  95. Using queue mode : byHost
  96. Using queue mode : byHost
  97. Using queue mode : byHost
  98. Fetcher: throughput threshold: -1
  99. Fetcher: throughput threshold retries: 5
  100. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  101. * queue: http://rockies.edu
  102.   maxThreads    = 1
  103.   inProgress    = 0
  104.   crawlDelay    = 5000
  105.   minCrawlDelay = 0
  106.   nextFetchTime = 1382488611166
  107.   now           = 1382488607144
  108.   0. http://rockies.edu/index.htm
  109.   1. http://rockies.edu/about.htm
  110.   2. http://rockies.edu/2193.htm
  111.   3. http://rockies.edu/about/assessment.htm
  112. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  113. * queue: http://rockies.edu
  114.   maxThreads    = 1
  115.   inProgress    = 0
  116.   crawlDelay    = 5000
  117.   minCrawlDelay = 0
  118.   nextFetchTime = 1382488611166
  119.   now           = 1382488608146
  120.   0. http://rockies.edu/index.htm
  121.   1. http://rockies.edu/about.htm
  122.   2. http://rockies.edu/2193.htm
  123.   3. http://rockies.edu/about/assessment.htm
  124. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  125. * queue: http://rockies.edu
  126.   maxThreads    = 1
  127.   inProgress    = 0
  128.   crawlDelay    = 5000
  129.   minCrawlDelay = 0
  130.   nextFetchTime = 1382488611166
  131.   now           = 1382488609148
  132.   0. http://rockies.edu/index.htm
  133.   1. http://rockies.edu/about.htm
  134.   2. http://rockies.edu/2193.htm
  135.   3. http://rockies.edu/about/assessment.htm
  136. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  137. * queue: http://rockies.edu
  138.   maxThreads    = 1
  139.   inProgress    = 0
  140.   crawlDelay    = 5000
  141.   minCrawlDelay = 0
  142.   nextFetchTime = 1382488611166
  143.   now           = 1382488610150
  144.   0. http://rockies.edu/index.htm
  145.   1. http://rockies.edu/about.htm
  146.   2. http://rockies.edu/2193.htm
  147.   3. http://rockies.edu/about/assessment.htm
  148. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  149. * queue: http://rockies.edu
  150.   maxThreads    = 1
  151.   inProgress    = 0
  152.   crawlDelay    = 5000
  153.   minCrawlDelay = 0
  154.   nextFetchTime = 1382488611166
  155.   now           = 1382488611152
  156.   0. http://rockies.edu/index.htm
  157.   1. http://rockies.edu/about.htm
  158.   2. http://rockies.edu/2193.htm
  159.   3. http://rockies.edu/about/assessment.htm
  160. fetching http://rockies.edu/index.htm (queue crawl delay=5000ms)
  161. -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
  162. * queue: http://rockies.edu
  163.   maxThreads    = 1
  164.   inProgress    = 1
  165.   crawlDelay    = 5000
  166.   minCrawlDelay = 0
  167.   nextFetchTime = 1382488611166
  168.   now           = 1382488612155
  169.   0. http://rockies.edu/about.htm
  170.   1. http://rockies.edu/2193.htm
  171.   2. http://rockies.edu/about/assessment.htm
  172. -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
  173. * queue: http://rockies.edu
  174.   maxThreads    = 1
  175.   inProgress    = 1
  176.   crawlDelay    = 5000
  177.   minCrawlDelay = 0
  178.   nextFetchTime = 1382488611166
  179.   now           = 1382488613156
  180.   0. http://rockies.edu/about.htm
  181.   1. http://rockies.edu/2193.htm
  182.   2. http://rockies.edu/about/assessment.htm
  183. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  184. * queue: http://rockies.edu
  185.   maxThreads    = 1
  186.   inProgress    = 0
  187.   crawlDelay    = 5000
  188.   minCrawlDelay = 0
  189.   nextFetchTime = 1382488618388
  190.   now           = 1382488614158
  191.   0. http://rockies.edu/about.htm
  192.   1. http://rockies.edu/2193.htm
  193.   2. http://rockies.edu/about/assessment.htm
  194. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  195. * queue: http://rockies.edu
  196.   maxThreads    = 1
  197.   inProgress    = 0
  198.   crawlDelay    = 5000
  199.   minCrawlDelay = 0
  200.   nextFetchTime = 1382488618388
  201.   now           = 1382488615160
  202.   0. http://rockies.edu/about.htm
  203.   1. http://rockies.edu/2193.htm
  204.   2. http://rockies.edu/about/assessment.htm
  205. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  206. * queue: http://rockies.edu
  207.   maxThreads    = 1
  208.   inProgress    = 0
  209.   crawlDelay    = 5000
  210.   minCrawlDelay = 0
  211.   nextFetchTime = 1382488618388
  212.   now           = 1382488616162
  213.   0. http://rockies.edu/about.htm
  214.   1. http://rockies.edu/2193.htm
  215.   2. http://rockies.edu/about/assessment.htm
  216. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  217. * queue: http://rockies.edu
  218.   maxThreads    = 1
  219.   inProgress    = 0
  220.   crawlDelay    = 5000
  221.   minCrawlDelay = 0
  222.   nextFetchTime = 1382488618388
  223.   now           = 1382488617164
  224.   0. http://rockies.edu/about.htm
  225.   1. http://rockies.edu/2193.htm
  226.   2. http://rockies.edu/about/assessment.htm
  227. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  228. * queue: http://rockies.edu
  229.   maxThreads    = 1
  230.   inProgress    = 0
  231.   crawlDelay    = 5000
  232.   minCrawlDelay = 0
  233.   nextFetchTime = 1382488618388
  234.   now           = 1382488618165
  235.   0. http://rockies.edu/about.htm
  236.   1. http://rockies.edu/2193.htm
  237.   2. http://rockies.edu/about/assessment.htm
  238. fetching http://rockies.edu/about.htm (queue crawl delay=5000ms)
  239. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
  240. * queue: http://rockies.edu
  241.   maxThreads    = 1
  242.   inProgress    = 0
  243.   crawlDelay    = 5000
  244.   minCrawlDelay = 0
  245.   nextFetchTime = 1382488623473
  246.   now           = 1382488619167
  247.   0. http://rockies.edu/2193.htm
  248.   1. http://rockies.edu/about/assessment.htm
  249. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
  250. * queue: http://rockies.edu
  251.   maxThreads    = 1
  252.   inProgress    = 0
  253.   crawlDelay    = 5000
  254.   minCrawlDelay = 0
  255.   nextFetchTime = 1382488623473
  256.   now           = 1382488620169
  257.   0. http://rockies.edu/2193.htm
  258.   1. http://rockies.edu/about/assessment.htm
  259. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
  260. * queue: http://rockies.edu
  261.   maxThreads    = 1
  262.   inProgress    = 0
  263.   crawlDelay    = 5000
  264.   minCrawlDelay = 0
  265.   nextFetchTime = 1382488623473
  266.   now           = 1382488621170
  267.   0. http://rockies.edu/2193.htm
  268.   1. http://rockies.edu/about/assessment.htm
  269. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
  270. * queue: http://rockies.edu
  271.   maxThreads    = 1
  272.   inProgress    = 0
  273.   crawlDelay    = 5000
  274.   minCrawlDelay = 0
  275.   nextFetchTime = 1382488623473
  276.   now           = 1382488622172
  277.   0. http://rockies.edu/2193.htm
  278.   1. http://rockies.edu/about/assessment.htm
  279. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
  280. * queue: http://rockies.edu
  281.   maxThreads    = 1
  282.   inProgress    = 0
  283.   crawlDelay    = 5000
  284.   minCrawlDelay = 0
  285.   nextFetchTime = 1382488623473
  286.   now           = 1382488623174
  287.   0. http://rockies.edu/2193.htm
  288.   1. http://rockies.edu/about/assessment.htm
  289. fetching http://rockies.edu/2193.htm (queue crawl delay=5000ms)
  290. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  291. * queue: http://rockies.edu
  292.   maxThreads    = 1
  293.   inProgress    = 0
  294.   crawlDelay    = 5000
  295.   minCrawlDelay = 0
  296.   nextFetchTime = 1382488628518
  297.   now           = 1382488624176
  298.   0. http://rockies.edu/about/assessment.htm
  299. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  300. * queue: http://rockies.edu
  301.   maxThreads    = 1
  302.   inProgress    = 0
  303.   crawlDelay    = 5000
  304.   minCrawlDelay = 0
  305.   nextFetchTime = 1382488628518
  306.   now           = 1382488625177
  307.   0. http://rockies.edu/about/assessment.htm
  308. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  309. * queue: http://rockies.edu
  310.   maxThreads    = 1
  311.   inProgress    = 0
  312.   crawlDelay    = 5000
  313.   minCrawlDelay = 0
  314.   nextFetchTime = 1382488628518
  315.   now           = 1382488626179
  316.   0. http://rockies.edu/about/assessment.htm
  317. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  318. * queue: http://rockies.edu
  319.   maxThreads    = 1
  320.   inProgress    = 0
  321.   crawlDelay    = 5000
  322.   minCrawlDelay = 0
  323.   nextFetchTime = 1382488628518
  324.   now           = 1382488627181
  325.   0. http://rockies.edu/about/assessment.htm
  326. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  327. * queue: http://rockies.edu
  328.   maxThreads    = 1
  329.   inProgress    = 0
  330.   crawlDelay    = 5000
  331.   minCrawlDelay = 0
  332.   nextFetchTime = 1382488628518
  333.   now           = 1382488628183
  334.   0. http://rockies.edu/about/assessment.htm
  335. fetching http://rockies.edu/about/assessment.htm (queue crawl delay=5000ms)
  336. -finishing thread FetcherThread, activeThreads=9
  337. -finishing thread FetcherThread, activeThreads=8
  338. -finishing thread FetcherThread, activeThreads=6
  339. -finishing thread FetcherThread, activeThreads=6
  340. -finishing thread FetcherThread, activeThreads=5
  341. -finishing thread FetcherThread, activeThreads=3
  342. -finishing thread FetcherThread, activeThreads=3
  343. -finishing thread FetcherThread, activeThreads=2
  344. -finishing thread FetcherThread, activeThreads=1
  345. -finishing thread FetcherThread, activeThreads=0
  346. -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
  347. -activeThreads=0
  348. Fetcher: finished at 2013-10-22 17:37:10, elapsed: 00:00:24
  349. ParseSegment: starting at 2013-10-22 17:37:10
  350. ParseSegment: segment: crawl-20131022173632/segments/20131022173644
  351. Parsed (6ms):http://rockies.edu/2193.htm
  352. Parsed (7ms):http://rockies.edu/RequestMoreInfo.htm
  353. Parsed (4ms):http://rockies.edu/about.htm
  354. Parsed (4ms):http://rockies.edu/about/assessment.htm
  355. ParseSegment: finished at 2013-10-22 17:37:11, elapsed: 00:00:01
  356. CrawlDb update: starting at 2013-10-22 17:37:11
  357. CrawlDb update: db: crawl-20131022173632/crawldb
  358. CrawlDb update: segments: [crawl-20131022173632/segments/20131022173644]
  359. CrawlDb update: additions allowed: true
  360. CrawlDb update: URL normalizing: true
  361. CrawlDb update: URL filtering: true
  362. CrawlDb update: 404 purging: false
  363. CrawlDb update: Merging segment data into db.
  364. CrawlDb update: finished at 2013-10-22 17:37:12, elapsed: 00:00:01
  365. Generator: starting at 2013-10-22 17:37:12
  366. Generator: Selecting best-scoring urls due for fetch.
  367. Generator: filtering: true
  368. Generator: normalizing: true
  369. Generator: topN: 5
  370. Generator: jobtracker is 'local', generating exactly one partition.
  371. Generator: Partitioning selected urls for politeness.
  372. Generator: segment: crawl-20131022173632/segments/20131022173714
  373. Generator: finished at 2013-10-22 17:37:15, elapsed: 00:00:03
  374. Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
  375. Fetcher: starting at 2013-10-22 17:37:15
  376. Fetcher: segment: crawl-20131022173632/segments/20131022173714
  377. Using queue mode : byHost
  378. Fetcher: threads: 10
  379. Fetcher: time-out divisor: 2
  380. QueueFeeder finished: total 5 records + hit by time limit :0
  381. Using queue mode : byHost
  382. fetching http://rockies.edu/about/diversity.htm (queue crawl delay=5000ms)
  383. Using queue mode : byHost
  384. Using queue mode : byHost
  385. Using queue mode : byHost
  386. Using queue mode : byHost
  387. Using queue mode : byHost
  388. Using queue mode : byHost
  389. Using queue mode : byHost
  390. Using queue mode : byHost
  391. Using queue mode : byHost
  392. Fetcher: throughput threshold: -1
  393. Fetcher: throughput threshold retries: 5
  394. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  395. * queue: http://rockies.edu
  396.   maxThreads    = 1
  397.   inProgress    = 0
  398.   crawlDelay    = 5000
  399.   minCrawlDelay = 0
  400.   nextFetchTime = 1382488640486
  401.   now           = 1382488636467
  402.   0. http://rockies.edu/about/board.htm
  403.   1. http://rockies.edu/about/faculty.htm
  404.   2. http://rockies.edu/about/accreditation.htm
  405.   3. http://rockies.edu/about/administration.htm
  406. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  407. * queue: http://rockies.edu
  408.   maxThreads    = 1
  409.   inProgress    = 0
  410.   crawlDelay    = 5000
  411.   minCrawlDelay = 0
  412.   nextFetchTime = 1382488640486
  413.   now           = 1382488637468
  414.   0. http://rockies.edu/about/board.htm
  415.   1. http://rockies.edu/about/faculty.htm
  416.   2. http://rockies.edu/about/accreditation.htm
  417.   3. http://rockies.edu/about/administration.htm
  418. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  419. * queue: http://rockies.edu
  420.   maxThreads    = 1
  421.   inProgress    = 0
  422.   crawlDelay    = 5000
  423.   minCrawlDelay = 0
  424.   nextFetchTime = 1382488640486
  425.   now           = 1382488638470
  426.   0. http://rockies.edu/about/board.htm
  427.   1. http://rockies.edu/about/faculty.htm
  428.   2. http://rockies.edu/about/accreditation.htm
  429.   3. http://rockies.edu/about/administration.htm
  430. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  431. * queue: http://rockies.edu
  432.   maxThreads    = 1
  433.   inProgress    = 0
  434.   crawlDelay    = 5000
  435.   minCrawlDelay = 0
  436.   nextFetchTime = 1382488640486
  437.   now           = 1382488639472
  438.   0. http://rockies.edu/about/board.htm
  439.   1. http://rockies.edu/about/faculty.htm
  440.   2. http://rockies.edu/about/accreditation.htm
  441.   3. http://rockies.edu/about/administration.htm
  442. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
  443. * queue: http://rockies.edu
  444.   maxThreads    = 1
  445.   inProgress    = 0
  446.   crawlDelay    = 5000
  447.   minCrawlDelay = 0
  448.   nextFetchTime = 1382488640486
  449.   now           = 1382488640474
  450.   0. http://rockies.edu/about/board.htm
  451.   1. http://rockies.edu/about/faculty.htm
  452.   2. http://rockies.edu/about/accreditation.htm
  453.   3. http://rockies.edu/about/administration.htm
  454. fetching http://rockies.edu/about/board.htm (queue crawl delay=5000ms)
  455. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  456. * queue: http://rockies.edu
  457.   maxThreads    = 1
  458.   inProgress    = 0
  459.   crawlDelay    = 5000
  460.   minCrawlDelay = 0
  461.   nextFetchTime = 1382488645525
  462.   now           = 1382488641476
  463.   0. http://rockies.edu/about/faculty.htm
  464.   1. http://rockies.edu/about/accreditation.htm
  465.   2. http://rockies.edu/about/administration.htm
  466. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  467. * queue: http://rockies.edu
  468.   maxThreads    = 1
  469.   inProgress    = 0
  470.   crawlDelay    = 5000
  471.   minCrawlDelay = 0
  472.   nextFetchTime = 1382488645525
  473.   now           = 1382488642478
  474.   0. http://rockies.edu/about/faculty.htm
  475.   1. http://rockies.edu/about/accreditation.htm
  476.   2. http://rockies.edu/about/administration.htm
  477. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  478. * queue: http://rockies.edu
  479.   maxThreads    = 1
  480.   inProgress    = 0
  481.   crawlDelay    = 5000
  482.   minCrawlDelay = 0
  483.   nextFetchTime = 1382488645525
  484.   now           = 1382488643480
  485.   0. http://rockies.edu/about/faculty.htm
  486.   1. http://rockies.edu/about/accreditation.htm
  487.   2. http://rockies.edu/about/administration.htm
  488. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  489. * queue: http://rockies.edu
  490.   maxThreads    = 1
  491.   inProgress    = 0
  492.   crawlDelay    = 5000
  493.   minCrawlDelay = 0
  494.   nextFetchTime = 1382488645525
  495.   now           = 1382488644482
  496.   0. http://rockies.edu/about/faculty.htm
  497.   1. http://rockies.edu/about/accreditation.htm
  498.   2. http://rockies.edu/about/administration.htm
  499. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
  500. * queue: http://rockies.edu
  501.   maxThreads    = 1
  502.   inProgress    = 0
  503.   crawlDelay    = 5000
  504.   minCrawlDelay = 0
  505.   nextFetchTime = 1382488645525
  506.   now           = 1382488645484
  507.   0. http://rockies.edu/about/faculty.htm
  508.   1. http://rockies.edu/about/accreditation.htm
  509.   2. http://rockies.edu/about/administration.htm
  510. fetching http://rockies.edu/about/faculty.htm (queue crawl delay=5000ms)
  511. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
  512. * queue: http://rockies.edu
  513.   maxThreads    = 1
  514.   inProgress    = 0
  515.   crawlDelay    = 5000
  516.   minCrawlDelay = 0
  517.   nextFetchTime = 1382488650569
  518.   now           = 1382488646485
  519.   0. http://rockies.edu/about/accreditation.htm
  520.   1. http://rockies.edu/about/administration.htm
  521. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
  522. * queue: http://rockies.edu
  523.   maxThreads    = 1
  524.   inProgress    = 0
  525.   crawlDelay    = 5000
  526.   minCrawlDelay = 0
  527.   nextFetchTime = 1382488650569
  528.   now           = 1382488647487
  529.   0. http://rockies.edu/about/accreditation.htm
  530.   1. http://rockies.edu/about/administration.htm
  531. -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=2
  532. * queue: http://rockies.edu
  533.   maxThreads    = 1
  534.   inProgress    = 0
  535.   crawlDelay    = 5000
  536.   minCrawlDelay = 0
  537.   nextFetchTime = 1382488650569
  538.   now           = 1382488648488
  539.   0. http://rockies.edu/about/accreditation.htm
  540.   1. http://rockies.edu/about/administration.htm
  541. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
  542. * queue: http://rockies.edu
  543.   maxThreads    = 1
  544.   inProgress    = 0
  545.   crawlDelay    = 5000
  546.   minCrawlDelay = 0
  547.   nextFetchTime = 1382488650569
  548.   now           = 1382488649491
  549.   0. http://rockies.edu/about/accreditation.htm
  550.   1. http://rockies.edu/about/administration.htm
  551. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
  552. * queue: http://rockies.edu
  553.   maxThreads    = 1
  554.   inProgress    = 0
  555.   crawlDelay    = 5000
  556.   minCrawlDelay = 0
  557.   nextFetchTime = 1382488650569
  558.   now           = 1382488650493
  559.   0. http://rockies.edu/about/accreditation.htm
  560.   1. http://rockies.edu/about/administration.htm
  561. fetching http://rockies.edu/about/accreditation.htm (queue crawl delay=5000ms)
  562. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  563. * queue: http://rockies.edu
  564.   maxThreads    = 1
  565.   inProgress    = 0
  566.   crawlDelay    = 5000
  567.   minCrawlDelay = 0
  568.   nextFetchTime = 1382488655606
  569.   now           = 1382488651495
  570.   0. http://rockies.edu/about/administration.htm
  571. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  572. * queue: http://rockies.edu
  573.   maxThreads    = 1
  574.   inProgress    = 0
  575.   crawlDelay    = 5000
  576.   minCrawlDelay = 0
  577.   nextFetchTime = 1382488655606
  578.   now           = 1382488652496
  579.   0. http://rockies.edu/about/administration.htm
  580. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  581. * queue: http://rockies.edu
  582.   maxThreads    = 1
  583.   inProgress    = 0
  584.   crawlDelay    = 5000
  585.   minCrawlDelay = 0
  586.   nextFetchTime = 1382488655606
  587.   now           = 1382488653498
  588.   0. http://rockies.edu/about/administration.htm
  589. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  590. * queue: http://rockies.edu
  591.   maxThreads    = 1
  592.   inProgress    = 0
  593.   crawlDelay    = 5000
  594.   minCrawlDelay = 0
  595.   nextFetchTime = 1382488655606
  596.   now           = 1382488654500
  597.   0. http://rockies.edu/about/administration.htm
  598. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  599. * queue: http://rockies.edu
  600.   maxThreads    = 1
  601.   inProgress    = 0
  602.   crawlDelay    = 5000
  603.   minCrawlDelay = 0
  604.   nextFetchTime = 1382488655606
  605.   now           = 1382488655502
  606.   0. http://rockies.edu/about/administration.htm
  607. fetching http://rockies.edu/about/administration.htm (queue crawl delay=5000ms)
  608. -finishing thread FetcherThread, activeThreads=9
  609. -finishing thread FetcherThread, activeThreads=7
  610. -finishing thread FetcherThread, activeThreads=7
  611. -finishing thread FetcherThread, activeThreads=6
  612. -finishing thread FetcherThread, activeThreads=4
  613. -finishing thread FetcherThread, activeThreads=3
  614. -finishing thread FetcherThread, activeThreads=2
  615. -finishing thread FetcherThread, activeThreads=4
  616. -finishing thread FetcherThread, activeThreads=1
  617. -finishing thread FetcherThread, activeThreads=0
  618. -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
  619. -activeThreads=0
  620. Fetcher: finished at 2013-10-22 17:37:37, elapsed: 00:00:22
  621. ParseSegment: starting at 2013-10-22 17:37:37
  622. ParseSegment: segment: crawl-20131022173632/segments/20131022173714
  623. Parsed (3ms):http://rockies.edu/about/accreditation.htm
  624. Parsed (3ms):http://rockies.edu/about/administration.htm
  625. Parsed (3ms):http://rockies.edu/about/board.htm
  626. Parsed (5ms):http://rockies.edu/about/diversity.htm
  627. Parsed (11ms):http://rockies.edu/about/faculty.htm
  628. ParseSegment: finished at 2013-10-22 17:37:38, elapsed: 00:00:01
  629. CrawlDb update: starting at 2013-10-22 17:37:38
  630. CrawlDb update: db: crawl-20131022173632/crawldb
  631. CrawlDb update: segments: [crawl-20131022173632/segments/20131022173714]
  632. CrawlDb update: additions allowed: true
  633. CrawlDb update: URL normalizing: true
  634. CrawlDb update: URL filtering: true
  635. CrawlDb update: 404 purging: false
  636. CrawlDb update: Merging segment data into db.
  637. CrawlDb update: finished at 2013-10-22 17:37:39, elapsed: 00:00:01
  638. LinkDb: starting at 2013-10-22 17:37:39
  639. LinkDb: linkdb: crawl-20131022173632/linkdb
  640. LinkDb: URL normalize: true
  641. LinkDb: URL filter: true
  642. LinkDb: internal links will be ignored.
  643. LinkDb: adding segment: file:/Users/mreyes4/Development/project_solr/nutch/crawl-20131022173632/segments/20131022173637
  644. LinkDb: adding segment: file:/Users/mreyes4/Development/project_solr/nutch/crawl-20131022173632/segments/20131022173644
  645. LinkDb: adding segment: file:/Users/mreyes4/Development/project_solr/nutch/crawl-20131022173632/segments/20131022173714
  646. LinkDb: finished at 2013-10-22 17:37:40, elapsed: 00:00:01
  647. Indexer: starting at 2013-10-22 17:37:40
  648. Indexer: deleting gone documents: false
  649. Indexer: URL filtering: false
  650. Indexer: URL normalizing: false
  651. Active IndexWriters :
  652. SOLRIndexWriter
  653.         solr.server.url : URL of the SOLR instance (mandatory)
  654.         solr.commit.size : buffer size when sending to SOLR (default 1000)
  655.         solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
  656.         solr.auth : use authentication (default false)
  657.         solr.auth.username : use authentication (default false)
  658.         solr.auth : username for authentication
  659.         solr.auth.password : password for authentication
  660.  
  661.  
  662. Exception in thread "main" java.io.IOException: Job failed!
  663.         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
  664.         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
  665.         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:81)
  666.         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:65)
  667.         at org.apache.nutch.crawl.Crawl.run(Crawl.java:155)
  668.         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  669.         at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement