Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 2013-01-12 05:37:37,148 INFO crawl.InjectorJob - InjectorJob: Using class org.apache.gora.sql.store.SqlStore as the Gora storage class.
- 2013-01-12 05:37:37,208 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- 2013-01-12 05:37:37,289 WARN snappy.LoadSnappy - Snappy native library not loaded
- 2013-01-12 05:37:37,787 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:37:37,798 INFO plugin.PluginRepository - Plugins: looking in: /opt/searchengine/nutch/plugins
- 2013-01-12 05:37:37,898 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true]
- 2013-01-12 05:37:37,898 INFO plugin.PluginRepository - Registered Plugins:
- 2013-01-12 05:37:37,898 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints)
- 2013-01-12 05:37:37,898 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Tika Parser Plug-in (parse-tika)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Anchor Indexing Filter (index-anchor)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - HTTP Framework (lib-http)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Registered Extension-Points:
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Parse Filter (org.apache.nutch.parse.ParseFilter)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser)
- 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
- 2013-01-12 05:37:37,988 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default
- 2013-01-12 05:37:38,485 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:37:38,586 INFO crawl.InjectorJob - InjectorJob: total number of urls rejected by filters: 0
- 2013-01-12 05:37:38,587 INFO crawl.InjectorJob - InjectorJob: total number of urls injected after normalization and filtering: 1
- 2013-01-12 05:37:39,086 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:37:39,286 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
- 2013-01-12 05:37:39,287 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
- 2013-01-12 05:37:39,287 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
- 2013-01-12 05:37:39,333 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
- 2013-01-12 05:37:39,400 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:37:39,976 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:37:40,963 INFO fetcher.FetcherJob - FetcherJob: threads: 10
- 2013-01-12 05:37:40,963 INFO fetcher.FetcherJob - FetcherJob: parsing: false
- 2013-01-12 05:37:40,963 INFO fetcher.FetcherJob - FetcherJob: resuming: false
- 2013-01-12 05:37:40,964 INFO fetcher.FetcherJob - FetcherJob : timelimit set for : -1
- 2013-01-12 05:37:41,251 INFO http.Http - http.proxy.host = null
- 2013-01-12 05:37:41,252 INFO http.Http - http.proxy.port = 8080
- 2013-01-12 05:37:41,252 INFO http.Http - http.timeout = 10000
- 2013-01-12 05:37:41,252 INFO http.Http - http.content.limit = 65536
- 2013-01-12 05:37:41,252 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
- 2013-01-12 05:37:41,252 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
- 2013-01-12 05:37:41,252 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- 2013-01-12 05:37:41,589 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:37:41,942 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:37:41,944 INFO fetcher.FetcherJob - Using queue mode : byHost
- 2013-01-12 05:37:41,944 INFO fetcher.FetcherJob - Fetcher: threads: 10
- 2013-01-12 05:37:41,957 INFO fetcher.FetcherJob - QueueFeeder finished: total 1 records. Hit by time limit :0
- 2013-01-12 05:37:41,969 INFO fetcher.FetcherJob - Fetcher: throughput threshold: -1
- 2013-01-12 05:37:41,970 INFO fetcher.FetcherJob - Fetcher: throughput threshold sequence: 5
- 2013-01-12 05:37:41,987 INFO fetcher.FetcherJob - fetching http://localhost/
- 2013-01-12 05:37:41,989 INFO http.Http - http.proxy.host = null
- 2013-01-12 05:37:41,989 INFO http.Http - http.proxy.port = 8080
- 2013-01-12 05:37:41,989 INFO http.Http - http.timeout = 10000
- 2013-01-12 05:37:41,989 INFO http.Http - http.content.limit = 65536
- 2013-01-12 05:37:41,989 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
- 2013-01-12 05:37:41,989 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
- 2013-01-12 05:37:41,989 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- 2013-01-12 05:37:41,990 INFO fetcher.FetcherJob - -finishing thread FetcherThread8, activeThreads=9
- 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread0, activeThreads=8
- 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread1, activeThreads=7
- 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread2, activeThreads=6
- 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread3, activeThreads=5
- 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread4, activeThreads=4
- 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread5, activeThreads=3
- 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread6, activeThreads=2
- 2013-01-12 05:37:41,992 INFO fetcher.FetcherJob - -finishing thread FetcherThread9, activeThreads=1
- 2013-01-12 05:37:42,067 INFO fetcher.FetcherJob - -finishing thread FetcherThread7, activeThreads=0
- 2013-01-12 05:37:46,970 INFO fetcher.FetcherJob - 0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 2 2 kb/s, 0 URLs in 0 queues
- 2013-01-12 05:37:46,971 INFO fetcher.FetcherJob - -activeThreads=0
- 2013-01-12 05:37:47,037 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:37:47,410 INFO parse.ParserJob - ParserJob: resuming: false
- 2013-01-12 05:37:47,411 INFO parse.ParserJob - ParserJob: forced reparse: false
- 2013-01-12 05:37:47,411 INFO parse.ParserJob - ParserJob: parsing all
- 2013-01-12 05:37:47,886 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
- 2013-01-12 05:37:48,111 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:37:48,155 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:37:48,161 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
- 2013-01-12 05:37:48,216 INFO parse.ParserJob - Parsing http://localhost/
- 2013-01-12 05:37:48,766 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default
- 2013-01-12 05:37:48,898 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:37:49,428 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:37:49,578 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:37:49,579 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
- 2013-01-12 05:37:49,579 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
- 2013-01-12 05:37:49,579 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
- 2013-01-12 05:37:49,679 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:37:50,611 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:37:50,884 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
- 2013-01-12 05:37:50,884 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
- 2013-01-12 05:37:50,884 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
- 2013-01-12 05:37:50,909 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
- 2013-01-12 05:37:51,011 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:37:51,162 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:37:51,530 INFO fetcher.FetcherJob - FetcherJob: threads: 10
- 2013-01-12 05:37:51,531 INFO fetcher.FetcherJob - FetcherJob: parsing: false
- 2013-01-12 05:37:51,531 INFO fetcher.FetcherJob - FetcherJob: resuming: false
- 2013-01-12 05:37:51,531 INFO fetcher.FetcherJob - FetcherJob : timelimit set for : -1
- 2013-01-12 05:37:51,533 INFO http.Http - http.proxy.host = null
- 2013-01-12 05:37:51,533 INFO http.Http - http.proxy.port = 8080
- 2013-01-12 05:37:51,533 INFO http.Http - http.timeout = 10000
- 2013-01-12 05:37:51,533 INFO http.Http - http.content.limit = 65536
- 2013-01-12 05:37:51,533 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
- 2013-01-12 05:37:51,533 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
- 2013-01-12 05:37:51,533 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- 2013-01-12 05:37:51,709 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:37:51,890 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:37:51,891 INFO fetcher.FetcherJob - Using queue mode : byHost
- 2013-01-12 05:37:51,891 INFO fetcher.FetcherJob - Fetcher: threads: 10
- 2013-01-12 05:37:51,900 INFO fetcher.FetcherJob - Fetcher: throughput threshold: -1
- 2013-01-12 05:37:51,901 INFO fetcher.FetcherJob - Fetcher: throughput threshold sequence: 5
- 2013-01-12 05:37:51,902 INFO fetcher.FetcherJob - fetching http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
- 2013-01-12 05:37:51,903 INFO http.Http - http.proxy.host = null
- 2013-01-12 05:37:51,903 INFO http.Http - http.proxy.port = 8080
- 2013-01-12 05:37:51,903 INFO http.Http - http.timeout = 10000
- 2013-01-12 05:37:51,903 INFO http.Http - http.content.limit = 65536
- 2013-01-12 05:37:51,904 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
- 2013-01-12 05:37:51,904 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
- 2013-01-12 05:37:51,904 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- 2013-01-12 05:37:51,909 INFO fetcher.FetcherJob - QueueFeeder finished: total 6 records. Hit by time limit :0
- 2013-01-12 05:37:56,901 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 64 64 kb/s, 5 URLs in 1 queues
- 2013-01-12 05:37:56,915 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Solr-install-v2.pdf
- 2013-01-12 05:38:01,901 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 83 102 kb/s, 4 URLs in 1 queues
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - * queue: http://localhost
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - maxThreads = 1
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - inProgress = 0
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - crawlDelay = 5000
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - minCrawlDelay = 0
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - nextFetchTime = 1357943881920
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - now = 1357943881902
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - 1. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - 2. http://localhost/
- 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - 3. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:38:01,924 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
- 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 76 64 kb/s, 3 URLs in 1 queues
- 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - * queue: http://localhost
- 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - maxThreads = 1
- 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - inProgress = 0
- 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - crawlDelay = 5000
- 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - minCrawlDelay = 0
- 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - nextFetchTime = 1357943886927
- 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - now = 1357943886904
- 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - 1. http://localhost/
- 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - 2. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:38:06,930 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:38:11,905 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 63 22 kb/s, 2 URLs in 1 queues
- 2013-01-12 05:38:11,905 INFO fetcher.FetcherJob - * queue: http://localhost
- 2013-01-12 05:38:11,905 INFO fetcher.FetcherJob - maxThreads = 1
- 2013-01-12 05:38:11,905 INFO fetcher.FetcherJob - inProgress = 0
- 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - crawlDelay = 5000
- 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - minCrawlDelay = 0
- 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - nextFetchTime = 1357943891933
- 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - now = 1357943891906
- 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - 0. http://localhost/
- 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - 1. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:38:11,937 INFO fetcher.FetcherJob - fetching http://localhost/
- 2013-01-12 05:38:16,906 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 51 2 kb/s, 1 URLs in 1 queues
- 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - * queue: http://localhost
- 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - maxThreads = 1
- 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - inProgress = 0
- 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - crawlDelay = 5000
- 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - minCrawlDelay = 0
- 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - nextFetchTime = 1357943896945
- 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - now = 1357943896907
- 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:38:16,949 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:38:16,954 INFO fetcher.FetcherJob - -finishing thread FetcherThread0, activeThreads=9
- 2013-01-12 05:38:17,416 INFO fetcher.FetcherJob - -finishing thread FetcherThread3, activeThreads=8
- 2013-01-12 05:38:17,416 INFO fetcher.FetcherJob - -finishing thread FetcherThread8, activeThreads=7
- 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread7, activeThreads=6
- 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread1, activeThreads=4
- 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread6, activeThreads=2
- 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread9, activeThreads=5
- 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread4, activeThreads=0
- 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread5, activeThreads=1
- 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread2, activeThreads=3
- 2013-01-12 05:38:21,908 INFO fetcher.FetcherJob - 0/0 spinwaiting/active, 6 pages, 0 errors, 0.2 0.2 pages/s, 46 22 kb/s, 0 URLs in 0 queues
- 2013-01-12 05:38:21,908 INFO fetcher.FetcherJob - -activeThreads=0
- 2013-01-12 05:38:24,882 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:38:25,683 INFO parse.ParserJob - ParserJob: resuming: false
- 2013-01-12 05:38:25,683 INFO parse.ParserJob - ParserJob: forced reparse: false
- 2013-01-12 05:38:25,683 INFO parse.ParserJob - ParserJob: parsing all
- 2013-01-12 05:38:25,785 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
- 2013-01-12 05:38:26,066 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:38:26,081 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:38:26,084 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
- 2013-01-12 05:38:26,102 INFO parse.ParserJob - Parsing http://localhost/
- 2013-01-12 05:38:26,110 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default
- 2013-01-12 05:38:26,116 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:38:26,116 INFO parse.ParserFactory - The parsing plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the plugin.includes system property, and all claim to support the content type application/vnd.oasis.opendocument.text, but they are not mapped to it in the parse-plugins.xml file
- 2013-01-12 05:38:26,251 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
- 2013-01-12 05:38:27,428 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf of type application/pdf
- 2013-01-12 05:38:27,436 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:38:27,453 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt of type application/vnd.oasis.opendocument.text
- 2013-01-12 05:38:27,456 INFO parse.ParserJob - Parsing http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
- 2013-01-12 05:38:27,502 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf of type application/pdf
- 2013-01-12 05:38:27,508 INFO parse.ParserJob - Parsing http://localhost/sapi/Solr-install-v2.pdf
- 2013-01-12 05:38:27,508 WARN parse.ParserJob - http://localhost/sapi/Solr-install-v2.pdf skipped. Content of size 395125 was truncated to 65536
- 2013-01-12 05:38:28,132 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:38:29,184 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:38:29,372 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:38:29,372 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
- 2013-01-12 05:38:29,372 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
- 2013-01-12 05:38:29,372 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
- 2013-01-12 05:38:29,447 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:38:30,337 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:38:30,466 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
- 2013-01-12 05:38:30,467 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
- 2013-01-12 05:38:30,467 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
- 2013-01-12 05:38:30,476 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
- 2013-01-12 05:38:30,508 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:38:30,563 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:38:31,203 INFO fetcher.FetcherJob - FetcherJob: threads: 10
- 2013-01-12 05:38:31,204 INFO fetcher.FetcherJob - FetcherJob: parsing: false
- 2013-01-12 05:38:31,204 INFO fetcher.FetcherJob - FetcherJob: resuming: false
- 2013-01-12 05:38:31,204 INFO fetcher.FetcherJob - FetcherJob : timelimit set for : -1
- 2013-01-12 05:38:31,205 INFO http.Http - http.proxy.host = null
- 2013-01-12 05:38:31,205 INFO http.Http - http.proxy.port = 8080
- 2013-01-12 05:38:31,205 INFO http.Http - http.timeout = 10000
- 2013-01-12 05:38:31,205 INFO http.Http - http.content.limit = 65536
- 2013-01-12 05:38:31,205 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
- 2013-01-12 05:38:31,205 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
- 2013-01-12 05:38:31,205 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- 2013-01-12 05:38:31,329 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:38:31,484 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:38:31,484 INFO fetcher.FetcherJob - Using queue mode : byHost
- 2013-01-12 05:38:31,484 INFO fetcher.FetcherJob - Fetcher: threads: 10
- 2013-01-12 05:38:31,505 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
- 2013-01-12 05:38:31,506 INFO http.Http - http.proxy.host = null
- 2013-01-12 05:38:31,506 INFO http.Http - http.proxy.port = 8080
- 2013-01-12 05:38:31,506 INFO http.Http - http.timeout = 10000
- 2013-01-12 05:38:31,506 INFO http.Http - http.content.limit = 65536
- 2013-01-12 05:38:31,506 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
- 2013-01-12 05:38:31,506 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
- 2013-01-12 05:38:31,506 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- 2013-01-12 05:38:31,509 INFO fetcher.FetcherJob - Fetcher: throughput threshold: -1
- 2013-01-12 05:38:31,509 INFO fetcher.FetcherJob - Fetcher: throughput threshold sequence: 5
- 2013-01-12 05:38:31,511 INFO fetcher.FetcherJob - QueueFeeder finished: total 7 records. Hit by time limit :0
- 2013-01-12 05:38:36,509 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 64 64 kb/s, 6 URLs in 1 queues
- 2013-01-12 05:38:36,513 INFO fetcher.FetcherJob - fetching http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
- 2013-01-12 05:38:41,510 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 64 64 kb/s, 5 URLs in 1 queues
- 2013-01-12 05:38:41,520 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Solr-install-v2.pdf
- 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 76 102 kb/s, 4 URLs in 1 queues
- 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - * queue: http://localhost
- 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - maxThreads = 1
- 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - inProgress = 0
- 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - crawlDelay = 5000
- 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - minCrawlDelay = 0
- 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - nextFetchTime = 1357943926525
- 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - now = 1357943926511
- 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - 0. http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
- 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - 1. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - 2. http://localhost/
- 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - 3. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:38:46,528 INFO fetcher.FetcherJob - fetching http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
- 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 73 64 kb/s, 3 URLs in 1 queues
- 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - * queue: http://localhost
- 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - maxThreads = 1
- 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - inProgress = 0
- 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - crawlDelay = 5000
- 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - minCrawlDelay = 0
- 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - nextFetchTime = 1357943931531
- 2013-01-12 05:38:51,512 INFO fetcher.FetcherJob - now = 1357943931512
- 2013-01-12 05:38:51,512 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:38:51,512 INFO fetcher.FetcherJob - 1. http://localhost/
- 2013-01-12 05:38:51,512 INFO fetcher.FetcherJob - 2. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:38:51,533 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:38:56,512 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 63 22 kb/s, 2 URLs in 1 queues
- 2013-01-12 05:38:56,512 INFO fetcher.FetcherJob - * queue: http://localhost
- 2013-01-12 05:38:56,512 INFO fetcher.FetcherJob - maxThreads = 1
- 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - inProgress = 0
- 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - crawlDelay = 5000
- 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - minCrawlDelay = 0
- 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - nextFetchTime = 1357943936535
- 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - now = 1357943936513
- 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - 0. http://localhost/
- 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - 1. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:38:56,538 INFO fetcher.FetcherJob - fetching http://localhost/
- 2013-01-12 05:39:01,513 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 6 pages, 0 errors, 0.2 0.2 pages/s, 53 2 kb/s, 1 URLs in 1 queues
- 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - * queue: http://localhost
- 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - maxThreads = 1
- 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - inProgress = 0
- 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - crawlDelay = 5000
- 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - minCrawlDelay = 0
- 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - nextFetchTime = 1357943941546
- 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - now = 1357943941514
- 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:39:01,549 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:39:01,552 INFO fetcher.FetcherJob - -finishing thread FetcherThread6, activeThreads=9
- 2013-01-12 05:39:02,023 INFO fetcher.FetcherJob - -finishing thread FetcherThread4, activeThreads=7
- 2013-01-12 05:39:02,023 INFO fetcher.FetcherJob - -finishing thread FetcherThread5, activeThreads=7
- 2013-01-12 05:39:02,024 INFO fetcher.FetcherJob - -finishing thread FetcherThread1, activeThreads=6
- 2013-01-12 05:39:02,024 INFO fetcher.FetcherJob - -finishing thread FetcherThread8, activeThreads=4
- 2013-01-12 05:39:02,024 INFO fetcher.FetcherJob - -finishing thread FetcherThread3, activeThreads=5
- 2013-01-12 05:39:02,029 INFO fetcher.FetcherJob - -finishing thread FetcherThread0, activeThreads=2
- 2013-01-12 05:39:02,029 INFO fetcher.FetcherJob - -finishing thread FetcherThread2, activeThreads=2
- 2013-01-12 05:39:02,030 INFO fetcher.FetcherJob - -finishing thread FetcherThread9, activeThreads=1
- 2013-01-12 05:39:02,030 INFO fetcher.FetcherJob - -finishing thread FetcherThread7, activeThreads=0
- 2013-01-12 05:39:06,515 INFO fetcher.FetcherJob - 0/0 spinwaiting/active, 7 pages, 0 errors, 0.2 0.2 pages/s, 48 22 kb/s, 0 URLs in 0 queues
- 2013-01-12 05:39:06,515 INFO fetcher.FetcherJob - -activeThreads=0
- 2013-01-12 05:39:06,965 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:39:07,300 INFO parse.ParserJob - ParserJob: resuming: false
- 2013-01-12 05:39:07,300 INFO parse.ParserJob - ParserJob: forced reparse: false
- 2013-01-12 05:39:07,300 INFO parse.ParserJob - ParserJob: parsing all
- 2013-01-12 05:39:07,374 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
- 2013-01-12 05:39:07,525 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:39:07,541 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:39:07,556 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
- 2013-01-12 05:39:07,575 INFO parse.ParserJob - Parsing http://localhost/
- 2013-01-12 05:39:07,583 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default
- 2013-01-12 05:39:07,594 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
- 2013-01-12 05:39:07,594 INFO parse.ParserFactory - The parsing plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the plugin.includes system property, and all claim to support the content type application/vnd.oasis.opendocument.text, but they are not mapped to it in the parse-plugins.xml file
- 2013-01-12 05:39:07,704 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt of type application/vnd.oasis.opendocument.text
- 2013-01-12 05:39:07,708 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
- 2013-01-12 05:39:07,763 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf of type application/pdf
- 2013-01-12 05:39:07,770 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
- 2013-01-12 05:39:07,797 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt of type application/vnd.oasis.opendocument.text
- 2013-01-12 05:39:07,801 INFO parse.ParserJob - Parsing http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
- 2013-01-12 05:39:07,873 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf of type application/pdf
- 2013-01-12 05:39:07,880 INFO parse.ParserJob - Parsing http://localhost/sapi/Solr-install-v2.pdf
- 2013-01-12 05:39:07,880 WARN parse.ParserJob - http://localhost/sapi/Solr-install-v2.pdf skipped. Content of size 395125 was truncated to 65536
- 2013-01-12 05:39:07,881 INFO parse.ParserJob - Parsing http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
- 2013-01-12 05:39:07,931 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf of type application/pdf
- 2013-01-12 05:39:08,062 WARN mapred.FileOutputCommitter - Output path is null in cleanup
- 2013-01-12 05:39:08,634 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
- 2013-01-12 05:39:09,469 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
- 2013-01-12 05:39:09,469 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
- 2013-01-12 05:39:09,469 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
- 2013-01-12 05:39:09,469 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
- 2013-01-12 05:39:09,731 WARN mapred.FileOutputCommitter - Output path is null in cleanup
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement