Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 2013-10-22 17:36:32.667 java[671:1203] Unable to load realm info from SCDynamicStore
- crawl started in: crawl-20131022173632
- rootUrlDir = urls
- threads = 10
- depth = 3
- solrUrl=http://localhost:8983/solr/
- topN = 5
- Injector: starting at 2013-10-22 17:36:32
- Injector: crawlDb: crawl-20131022173632/crawldb
- Injector: urlDir: urls
- Injector: Converting injected urls to crawl db entries.
- Injector: total number of urls rejected by filters: 0
- Injector: total number of urls injected after normalization and filtering: 1
- Injector: Merging injected urls into crawl db.
- Injector: finished at 2013-10-22 17:36:35, elapsed: 00:00:02
- Generator: starting at 2013-10-22 17:36:35
- Generator: Selecting best-scoring urls due for fetch.
- Generator: filtering: true
- Generator: normalizing: true
- Generator: topN: 5
- Generator: jobtracker is 'local', generating exactly one partition.
- Generator: Partitioning selected urls for politeness.
- Generator: segment: crawl-20131022173632/segments/20131022173637
- Generator: finished at 2013-10-22 17:36:38, elapsed: 00:00:03
- Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
- Fetcher: starting at 2013-10-22 17:36:38
- Fetcher: segment: crawl-20131022173632/segments/20131022173637
- Using queue mode : byHost
- Fetcher: threads: 10
- Fetcher: time-out divisor: 2
- QueueFeeder finished: total 1 records + hit by time limit :0
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- -finishing thread FetcherThread, activeThreads=1
- fetching http://rockies.edu/ (queue crawl delay=5000ms)
- -finishing thread FetcherThread, activeThreads=1
- Using queue mode : byHost
- -finishing thread FetcherThread, activeThreads=1
- Using queue mode : byHost
- Using queue mode : byHost
- -finishing thread FetcherThread, activeThreads=1
- -finishing thread FetcherThread, activeThreads=1
- Using queue mode : byHost
- -finishing thread FetcherThread, activeThreads=1
- Using queue mode : byHost
- -finishing thread FetcherThread, activeThreads=1
- Using queue mode : byHost
- -finishing thread FetcherThread, activeThreads=1
- Using queue mode : byHost
- Fetcher: throughput threshold: -1
- Fetcher: throughput threshold retries: 5
- -finishing thread FetcherThread, activeThreads=1
- -finishing thread FetcherThread, activeThreads=0
- -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
- -activeThreads=0
- Fetcher: finished at 2013-10-22 17:36:40, elapsed: 00:00:02
- ParseSegment: starting at 2013-10-22 17:36:40
- ParseSegment: segment: crawl-20131022173632/segments/20131022173637
- Parsed (12ms):http://rockies.edu/
- ParseSegment: finished at 2013-10-22 17:36:41, elapsed: 00:00:01
- CrawlDb update: starting at 2013-10-22 17:36:41
- CrawlDb update: db: crawl-20131022173632/crawldb
- CrawlDb update: segments: [crawl-20131022173632/segments/20131022173637]
- CrawlDb update: additions allowed: true
- CrawlDb update: URL normalizing: true
- CrawlDb update: URL filtering: true
- CrawlDb update: 404 purging: false
- CrawlDb update: Merging segment data into db.
- CrawlDb update: finished at 2013-10-22 17:36:42, elapsed: 00:00:01
- Generator: starting at 2013-10-22 17:36:42
- Generator: Selecting best-scoring urls due for fetch.
- Generator: filtering: true
- Generator: normalizing: true
- Generator: topN: 5
- Generator: jobtracker is 'local', generating exactly one partition.
- Generator: Partitioning selected urls for politeness.
- Generator: segment: crawl-20131022173632/segments/20131022173644
- Generator: finished at 2013-10-22 17:36:45, elapsed: 00:00:03
- Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
- Fetcher: starting at 2013-10-22 17:36:45
- Fetcher: segment: crawl-20131022173632/segments/20131022173644
- Using queue mode : byHost
- Fetcher: threads: 10
- Fetcher: time-out divisor: 2
- QueueFeeder finished: total 5 records + hit by time limit :0
- Using queue mode : byHost
- fetching http://rockies.edu/RequestMoreInfo.htm (queue crawl delay=5000ms)
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Fetcher: throughput threshold: -1
- Fetcher: throughput threshold retries: 5
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488611166
- now = 1382488607144
- 0. http://rockies.edu/index.htm
- 1. http://rockies.edu/about.htm
- 2. http://rockies.edu/2193.htm
- 3. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488611166
- now = 1382488608146
- 0. http://rockies.edu/index.htm
- 1. http://rockies.edu/about.htm
- 2. http://rockies.edu/2193.htm
- 3. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488611166
- now = 1382488609148
- 0. http://rockies.edu/index.htm
- 1. http://rockies.edu/about.htm
- 2. http://rockies.edu/2193.htm
- 3. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488611166
- now = 1382488610150
- 0. http://rockies.edu/index.htm
- 1. http://rockies.edu/about.htm
- 2. http://rockies.edu/2193.htm
- 3. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488611166
- now = 1382488611152
- 0. http://rockies.edu/index.htm
- 1. http://rockies.edu/about.htm
- 2. http://rockies.edu/2193.htm
- 3. http://rockies.edu/about/assessment.htm
- fetching http://rockies.edu/index.htm (queue crawl delay=5000ms)
- -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 1
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488611166
- now = 1382488612155
- 0. http://rockies.edu/about.htm
- 1. http://rockies.edu/2193.htm
- 2. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 1
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488611166
- now = 1382488613156
- 0. http://rockies.edu/about.htm
- 1. http://rockies.edu/2193.htm
- 2. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488618388
- now = 1382488614158
- 0. http://rockies.edu/about.htm
- 1. http://rockies.edu/2193.htm
- 2. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488618388
- now = 1382488615160
- 0. http://rockies.edu/about.htm
- 1. http://rockies.edu/2193.htm
- 2. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488618388
- now = 1382488616162
- 0. http://rockies.edu/about.htm
- 1. http://rockies.edu/2193.htm
- 2. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488618388
- now = 1382488617164
- 0. http://rockies.edu/about.htm
- 1. http://rockies.edu/2193.htm
- 2. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488618388
- now = 1382488618165
- 0. http://rockies.edu/about.htm
- 1. http://rockies.edu/2193.htm
- 2. http://rockies.edu/about/assessment.htm
- fetching http://rockies.edu/about.htm (queue crawl delay=5000ms)
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488623473
- now = 1382488619167
- 0. http://rockies.edu/2193.htm
- 1. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488623473
- now = 1382488620169
- 0. http://rockies.edu/2193.htm
- 1. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488623473
- now = 1382488621170
- 0. http://rockies.edu/2193.htm
- 1. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488623473
- now = 1382488622172
- 0. http://rockies.edu/2193.htm
- 1. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488623473
- now = 1382488623174
- 0. http://rockies.edu/2193.htm
- 1. http://rockies.edu/about/assessment.htm
- fetching http://rockies.edu/2193.htm (queue crawl delay=5000ms)
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488628518
- now = 1382488624176
- 0. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488628518
- now = 1382488625177
- 0. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488628518
- now = 1382488626179
- 0. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488628518
- now = 1382488627181
- 0. http://rockies.edu/about/assessment.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488628518
- now = 1382488628183
- 0. http://rockies.edu/about/assessment.htm
- fetching http://rockies.edu/about/assessment.htm (queue crawl delay=5000ms)
- -finishing thread FetcherThread, activeThreads=9
- -finishing thread FetcherThread, activeThreads=8
- -finishing thread FetcherThread, activeThreads=6
- -finishing thread FetcherThread, activeThreads=6
- -finishing thread FetcherThread, activeThreads=5
- -finishing thread FetcherThread, activeThreads=3
- -finishing thread FetcherThread, activeThreads=3
- -finishing thread FetcherThread, activeThreads=2
- -finishing thread FetcherThread, activeThreads=1
- -finishing thread FetcherThread, activeThreads=0
- -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
- -activeThreads=0
- Fetcher: finished at 2013-10-22 17:37:10, elapsed: 00:00:24
- ParseSegment: starting at 2013-10-22 17:37:10
- ParseSegment: segment: crawl-20131022173632/segments/20131022173644
- Parsed (6ms):http://rockies.edu/2193.htm
- Parsed (7ms):http://rockies.edu/RequestMoreInfo.htm
- Parsed (4ms):http://rockies.edu/about.htm
- Parsed (4ms):http://rockies.edu/about/assessment.htm
- ParseSegment: finished at 2013-10-22 17:37:11, elapsed: 00:00:01
- CrawlDb update: starting at 2013-10-22 17:37:11
- CrawlDb update: db: crawl-20131022173632/crawldb
- CrawlDb update: segments: [crawl-20131022173632/segments/20131022173644]
- CrawlDb update: additions allowed: true
- CrawlDb update: URL normalizing: true
- CrawlDb update: URL filtering: true
- CrawlDb update: 404 purging: false
- CrawlDb update: Merging segment data into db.
- CrawlDb update: finished at 2013-10-22 17:37:12, elapsed: 00:00:01
- Generator: starting at 2013-10-22 17:37:12
- Generator: Selecting best-scoring urls due for fetch.
- Generator: filtering: true
- Generator: normalizing: true
- Generator: topN: 5
- Generator: jobtracker is 'local', generating exactly one partition.
- Generator: Partitioning selected urls for politeness.
- Generator: segment: crawl-20131022173632/segments/20131022173714
- Generator: finished at 2013-10-22 17:37:15, elapsed: 00:00:03
- Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
- Fetcher: starting at 2013-10-22 17:37:15
- Fetcher: segment: crawl-20131022173632/segments/20131022173714
- Using queue mode : byHost
- Fetcher: threads: 10
- Fetcher: time-out divisor: 2
- QueueFeeder finished: total 5 records + hit by time limit :0
- Using queue mode : byHost
- fetching http://rockies.edu/about/diversity.htm (queue crawl delay=5000ms)
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Using queue mode : byHost
- Fetcher: throughput threshold: -1
- Fetcher: throughput threshold retries: 5
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488640486
- now = 1382488636467
- 0. http://rockies.edu/about/board.htm
- 1. http://rockies.edu/about/faculty.htm
- 2. http://rockies.edu/about/accreditation.htm
- 3. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488640486
- now = 1382488637468
- 0. http://rockies.edu/about/board.htm
- 1. http://rockies.edu/about/faculty.htm
- 2. http://rockies.edu/about/accreditation.htm
- 3. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488640486
- now = 1382488638470
- 0. http://rockies.edu/about/board.htm
- 1. http://rockies.edu/about/faculty.htm
- 2. http://rockies.edu/about/accreditation.htm
- 3. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488640486
- now = 1382488639472
- 0. http://rockies.edu/about/board.htm
- 1. http://rockies.edu/about/faculty.htm
- 2. http://rockies.edu/about/accreditation.htm
- 3. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488640486
- now = 1382488640474
- 0. http://rockies.edu/about/board.htm
- 1. http://rockies.edu/about/faculty.htm
- 2. http://rockies.edu/about/accreditation.htm
- 3. http://rockies.edu/about/administration.htm
- fetching http://rockies.edu/about/board.htm (queue crawl delay=5000ms)
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488645525
- now = 1382488641476
- 0. http://rockies.edu/about/faculty.htm
- 1. http://rockies.edu/about/accreditation.htm
- 2. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488645525
- now = 1382488642478
- 0. http://rockies.edu/about/faculty.htm
- 1. http://rockies.edu/about/accreditation.htm
- 2. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488645525
- now = 1382488643480
- 0. http://rockies.edu/about/faculty.htm
- 1. http://rockies.edu/about/accreditation.htm
- 2. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488645525
- now = 1382488644482
- 0. http://rockies.edu/about/faculty.htm
- 1. http://rockies.edu/about/accreditation.htm
- 2. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488645525
- now = 1382488645484
- 0. http://rockies.edu/about/faculty.htm
- 1. http://rockies.edu/about/accreditation.htm
- 2. http://rockies.edu/about/administration.htm
- fetching http://rockies.edu/about/faculty.htm (queue crawl delay=5000ms)
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488650569
- now = 1382488646485
- 0. http://rockies.edu/about/accreditation.htm
- 1. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488650569
- now = 1382488647487
- 0. http://rockies.edu/about/accreditation.htm
- 1. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488650569
- now = 1382488648488
- 0. http://rockies.edu/about/accreditation.htm
- 1. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488650569
- now = 1382488649491
- 0. http://rockies.edu/about/accreditation.htm
- 1. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488650569
- now = 1382488650493
- 0. http://rockies.edu/about/accreditation.htm
- 1. http://rockies.edu/about/administration.htm
- fetching http://rockies.edu/about/accreditation.htm (queue crawl delay=5000ms)
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488655606
- now = 1382488651495
- 0. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488655606
- now = 1382488652496
- 0. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488655606
- now = 1382488653498
- 0. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488655606
- now = 1382488654500
- 0. http://rockies.edu/about/administration.htm
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
- * queue: http://rockies.edu
- maxThreads = 1
- inProgress = 0
- crawlDelay = 5000
- minCrawlDelay = 0
- nextFetchTime = 1382488655606
- now = 1382488655502
- 0. http://rockies.edu/about/administration.htm
- fetching http://rockies.edu/about/administration.htm (queue crawl delay=5000ms)
- -finishing thread FetcherThread, activeThreads=9
- -finishing thread FetcherThread, activeThreads=7
- -finishing thread FetcherThread, activeThreads=7
- -finishing thread FetcherThread, activeThreads=6
- -finishing thread FetcherThread, activeThreads=4
- -finishing thread FetcherThread, activeThreads=3
- -finishing thread FetcherThread, activeThreads=2
- -finishing thread FetcherThread, activeThreads=4
- -finishing thread FetcherThread, activeThreads=1
- -finishing thread FetcherThread, activeThreads=0
- -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
- -activeThreads=0
- Fetcher: finished at 2013-10-22 17:37:37, elapsed: 00:00:22
- ParseSegment: starting at 2013-10-22 17:37:37
- ParseSegment: segment: crawl-20131022173632/segments/20131022173714
- Parsed (3ms):http://rockies.edu/about/accreditation.htm
- Parsed (3ms):http://rockies.edu/about/administration.htm
- Parsed (3ms):http://rockies.edu/about/board.htm
- Parsed (5ms):http://rockies.edu/about/diversity.htm
- Parsed (11ms):http://rockies.edu/about/faculty.htm
- ParseSegment: finished at 2013-10-22 17:37:38, elapsed: 00:00:01
- CrawlDb update: starting at 2013-10-22 17:37:38
- CrawlDb update: db: crawl-20131022173632/crawldb
- CrawlDb update: segments: [crawl-20131022173632/segments/20131022173714]
- CrawlDb update: additions allowed: true
- CrawlDb update: URL normalizing: true
- CrawlDb update: URL filtering: true
- CrawlDb update: 404 purging: false
- CrawlDb update: Merging segment data into db.
- CrawlDb update: finished at 2013-10-22 17:37:39, elapsed: 00:00:01
- LinkDb: starting at 2013-10-22 17:37:39
- LinkDb: linkdb: crawl-20131022173632/linkdb
- LinkDb: URL normalize: true
- LinkDb: URL filter: true
- LinkDb: internal links will be ignored.
- LinkDb: adding segment: file:/Users/mreyes4/Development/project_solr/nutch/crawl-20131022173632/segments/20131022173637
- LinkDb: adding segment: file:/Users/mreyes4/Development/project_solr/nutch/crawl-20131022173632/segments/20131022173644
- LinkDb: adding segment: file:/Users/mreyes4/Development/project_solr/nutch/crawl-20131022173632/segments/20131022173714
- LinkDb: finished at 2013-10-22 17:37:40, elapsed: 00:00:01
- Indexer: starting at 2013-10-22 17:37:40
- Indexer: deleting gone documents: false
- Indexer: URL filtering: false
- Indexer: URL normalizing: false
- Active IndexWriters :
- SOLRIndexWriter
- solr.server.url : URL of the SOLR instance (mandatory)
- solr.commit.size : buffer size when sending to SOLR (default 1000)
- solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
- solr.auth : use authentication (default false)
- solr.auth.username : use authentication (default false)
- solr.auth : username for authentication
- solr.auth.password : password for authentication
- Exception in thread "main" java.io.IOException: Job failed!
- at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
- at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
- at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:81)
- at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:65)
- at org.apache.nutch.crawl.Crawl.run(Crawl.java:155)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
- at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement