Advertisement
billydekid

hadoop.log

Jan 11th, 2013
88
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 33.35 KB | None | 0 0
  1. 2013-01-12 05:37:37,148 INFO crawl.InjectorJob - InjectorJob: Using class org.apache.gora.sql.store.SqlStore as the Gora storage class.
  2. 2013-01-12 05:37:37,208 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. 2013-01-12 05:37:37,289 WARN snappy.LoadSnappy - Snappy native library not loaded
  4. 2013-01-12 05:37:37,787 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  5. 2013-01-12 05:37:37,798 INFO plugin.PluginRepository - Plugins: looking in: /opt/searchengine/nutch/plugins
  6. 2013-01-12 05:37:37,898 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true]
  7. 2013-01-12 05:37:37,898 INFO plugin.PluginRepository - Registered Plugins:
  8. 2013-01-12 05:37:37,898 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints)
  9. 2013-01-12 05:37:37,898 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex)
  10. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml)
  11. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic)
  12. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic)
  13. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Tika Parser Plug-in (parse-tika)
  14. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic)
  15. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html)
  16. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Anchor Indexing Filter (index-anchor)
  17. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - HTTP Framework (lib-http)
  18. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex)
  19. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter)
  20. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass)
  21. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http)
  22. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Registered Extension-Points:
  23. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
  24. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol)
  25. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Parse Filter (org.apache.nutch.parse.ParseFilter)
  26. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter)
  27. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
  28. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser)
  29. 2013-01-12 05:37:37,899 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
  30. 2013-01-12 05:37:37,988 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default
  31. 2013-01-12 05:37:38,485 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  32. 2013-01-12 05:37:38,586 INFO crawl.InjectorJob - InjectorJob: total number of urls rejected by filters: 0
  33. 2013-01-12 05:37:38,587 INFO crawl.InjectorJob - InjectorJob: total number of urls injected after normalization and filtering: 1
  34. 2013-01-12 05:37:39,086 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  35. 2013-01-12 05:37:39,286 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
  36. 2013-01-12 05:37:39,287 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
  37. 2013-01-12 05:37:39,287 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
  38. 2013-01-12 05:37:39,333 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
  39. 2013-01-12 05:37:39,400 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  40. 2013-01-12 05:37:39,976 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  41. 2013-01-12 05:37:40,963 INFO fetcher.FetcherJob - FetcherJob: threads: 10
  42. 2013-01-12 05:37:40,963 INFO fetcher.FetcherJob - FetcherJob: parsing: false
  43. 2013-01-12 05:37:40,963 INFO fetcher.FetcherJob - FetcherJob: resuming: false
  44. 2013-01-12 05:37:40,964 INFO fetcher.FetcherJob - FetcherJob : timelimit set for : -1
  45. 2013-01-12 05:37:41,251 INFO http.Http - http.proxy.host = null
  46. 2013-01-12 05:37:41,252 INFO http.Http - http.proxy.port = 8080
  47. 2013-01-12 05:37:41,252 INFO http.Http - http.timeout = 10000
  48. 2013-01-12 05:37:41,252 INFO http.Http - http.content.limit = 65536
  49. 2013-01-12 05:37:41,252 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
  50. 2013-01-12 05:37:41,252 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
  51. 2013-01-12 05:37:41,252 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  52. 2013-01-12 05:37:41,589 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  53. 2013-01-12 05:37:41,942 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  54. 2013-01-12 05:37:41,944 INFO fetcher.FetcherJob - Using queue mode : byHost
  55. 2013-01-12 05:37:41,944 INFO fetcher.FetcherJob - Fetcher: threads: 10
  56. 2013-01-12 05:37:41,957 INFO fetcher.FetcherJob - QueueFeeder finished: total 1 records. Hit by time limit :0
  57. 2013-01-12 05:37:41,969 INFO fetcher.FetcherJob - Fetcher: throughput threshold: -1
  58. 2013-01-12 05:37:41,970 INFO fetcher.FetcherJob - Fetcher: throughput threshold sequence: 5
  59. 2013-01-12 05:37:41,987 INFO fetcher.FetcherJob - fetching http://localhost/
  60. 2013-01-12 05:37:41,989 INFO http.Http - http.proxy.host = null
  61. 2013-01-12 05:37:41,989 INFO http.Http - http.proxy.port = 8080
  62. 2013-01-12 05:37:41,989 INFO http.Http - http.timeout = 10000
  63. 2013-01-12 05:37:41,989 INFO http.Http - http.content.limit = 65536
  64. 2013-01-12 05:37:41,989 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
  65. 2013-01-12 05:37:41,989 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
  66. 2013-01-12 05:37:41,989 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  67. 2013-01-12 05:37:41,990 INFO fetcher.FetcherJob - -finishing thread FetcherThread8, activeThreads=9
  68. 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread0, activeThreads=8
  69. 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread1, activeThreads=7
  70. 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread2, activeThreads=6
  71. 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread3, activeThreads=5
  72. 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread4, activeThreads=4
  73. 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread5, activeThreads=3
  74. 2013-01-12 05:37:41,991 INFO fetcher.FetcherJob - -finishing thread FetcherThread6, activeThreads=2
  75. 2013-01-12 05:37:41,992 INFO fetcher.FetcherJob - -finishing thread FetcherThread9, activeThreads=1
  76. 2013-01-12 05:37:42,067 INFO fetcher.FetcherJob - -finishing thread FetcherThread7, activeThreads=0
  77. 2013-01-12 05:37:46,970 INFO fetcher.FetcherJob - 0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 2 2 kb/s, 0 URLs in 0 queues
  78. 2013-01-12 05:37:46,971 INFO fetcher.FetcherJob - -activeThreads=0
  79. 2013-01-12 05:37:47,037 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  80. 2013-01-12 05:37:47,410 INFO parse.ParserJob - ParserJob: resuming: false
  81. 2013-01-12 05:37:47,411 INFO parse.ParserJob - ParserJob: forced reparse: false
  82. 2013-01-12 05:37:47,411 INFO parse.ParserJob - ParserJob: parsing all
  83. 2013-01-12 05:37:47,886 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
  84. 2013-01-12 05:37:48,111 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  85. 2013-01-12 05:37:48,155 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  86. 2013-01-12 05:37:48,161 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
  87. 2013-01-12 05:37:48,216 INFO parse.ParserJob - Parsing http://localhost/
  88. 2013-01-12 05:37:48,766 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default
  89. 2013-01-12 05:37:48,898 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  90. 2013-01-12 05:37:49,428 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  91. 2013-01-12 05:37:49,578 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  92. 2013-01-12 05:37:49,579 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
  93. 2013-01-12 05:37:49,579 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
  94. 2013-01-12 05:37:49,579 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
  95. 2013-01-12 05:37:49,679 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  96. 2013-01-12 05:37:50,611 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  97. 2013-01-12 05:37:50,884 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
  98. 2013-01-12 05:37:50,884 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
  99. 2013-01-12 05:37:50,884 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
  100. 2013-01-12 05:37:50,909 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
  101. 2013-01-12 05:37:51,011 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  102. 2013-01-12 05:37:51,162 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  103. 2013-01-12 05:37:51,530 INFO fetcher.FetcherJob - FetcherJob: threads: 10
  104. 2013-01-12 05:37:51,531 INFO fetcher.FetcherJob - FetcherJob: parsing: false
  105. 2013-01-12 05:37:51,531 INFO fetcher.FetcherJob - FetcherJob: resuming: false
  106. 2013-01-12 05:37:51,531 INFO fetcher.FetcherJob - FetcherJob : timelimit set for : -1
  107. 2013-01-12 05:37:51,533 INFO http.Http - http.proxy.host = null
  108. 2013-01-12 05:37:51,533 INFO http.Http - http.proxy.port = 8080
  109. 2013-01-12 05:37:51,533 INFO http.Http - http.timeout = 10000
  110. 2013-01-12 05:37:51,533 INFO http.Http - http.content.limit = 65536
  111. 2013-01-12 05:37:51,533 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
  112. 2013-01-12 05:37:51,533 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
  113. 2013-01-12 05:37:51,533 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  114. 2013-01-12 05:37:51,709 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  115. 2013-01-12 05:37:51,890 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  116. 2013-01-12 05:37:51,891 INFO fetcher.FetcherJob - Using queue mode : byHost
  117. 2013-01-12 05:37:51,891 INFO fetcher.FetcherJob - Fetcher: threads: 10
  118. 2013-01-12 05:37:51,900 INFO fetcher.FetcherJob - Fetcher: throughput threshold: -1
  119. 2013-01-12 05:37:51,901 INFO fetcher.FetcherJob - Fetcher: throughput threshold sequence: 5
  120. 2013-01-12 05:37:51,902 INFO fetcher.FetcherJob - fetching http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
  121. 2013-01-12 05:37:51,903 INFO http.Http - http.proxy.host = null
  122. 2013-01-12 05:37:51,903 INFO http.Http - http.proxy.port = 8080
  123. 2013-01-12 05:37:51,903 INFO http.Http - http.timeout = 10000
  124. 2013-01-12 05:37:51,903 INFO http.Http - http.content.limit = 65536
  125. 2013-01-12 05:37:51,904 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
  126. 2013-01-12 05:37:51,904 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
  127. 2013-01-12 05:37:51,904 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  128. 2013-01-12 05:37:51,909 INFO fetcher.FetcherJob - QueueFeeder finished: total 6 records. Hit by time limit :0
  129. 2013-01-12 05:37:56,901 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 64 64 kb/s, 5 URLs in 1 queues
  130. 2013-01-12 05:37:56,915 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Solr-install-v2.pdf
  131. 2013-01-12 05:38:01,901 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 83 102 kb/s, 4 URLs in 1 queues
  132. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - * queue: http://localhost
  133. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - maxThreads = 1
  134. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - inProgress = 0
  135. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - crawlDelay = 5000
  136. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - minCrawlDelay = 0
  137. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - nextFetchTime = 1357943881920
  138. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - now = 1357943881902
  139. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
  140. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - 1. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  141. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - 2. http://localhost/
  142. 2013-01-12 05:38:01,902 INFO fetcher.FetcherJob - 3. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  143. 2013-01-12 05:38:01,924 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
  144. 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 76 64 kb/s, 3 URLs in 1 queues
  145. 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - * queue: http://localhost
  146. 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - maxThreads = 1
  147. 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - inProgress = 0
  148. 2013-01-12 05:38:06,903 INFO fetcher.FetcherJob - crawlDelay = 5000
  149. 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - minCrawlDelay = 0
  150. 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - nextFetchTime = 1357943886927
  151. 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - now = 1357943886904
  152. 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  153. 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - 1. http://localhost/
  154. 2013-01-12 05:38:06,904 INFO fetcher.FetcherJob - 2. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  155. 2013-01-12 05:38:06,930 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  156. 2013-01-12 05:38:11,905 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 63 22 kb/s, 2 URLs in 1 queues
  157. 2013-01-12 05:38:11,905 INFO fetcher.FetcherJob - * queue: http://localhost
  158. 2013-01-12 05:38:11,905 INFO fetcher.FetcherJob - maxThreads = 1
  159. 2013-01-12 05:38:11,905 INFO fetcher.FetcherJob - inProgress = 0
  160. 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - crawlDelay = 5000
  161. 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - minCrawlDelay = 0
  162. 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - nextFetchTime = 1357943891933
  163. 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - now = 1357943891906
  164. 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - 0. http://localhost/
  165. 2013-01-12 05:38:11,906 INFO fetcher.FetcherJob - 1. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  166. 2013-01-12 05:38:11,937 INFO fetcher.FetcherJob - fetching http://localhost/
  167. 2013-01-12 05:38:16,906 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 51 2 kb/s, 1 URLs in 1 queues
  168. 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - * queue: http://localhost
  169. 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - maxThreads = 1
  170. 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - inProgress = 0
  171. 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - crawlDelay = 5000
  172. 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - minCrawlDelay = 0
  173. 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - nextFetchTime = 1357943896945
  174. 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - now = 1357943896907
  175. 2013-01-12 05:38:16,907 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  176. 2013-01-12 05:38:16,949 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  177. 2013-01-12 05:38:16,954 INFO fetcher.FetcherJob - -finishing thread FetcherThread0, activeThreads=9
  178. 2013-01-12 05:38:17,416 INFO fetcher.FetcherJob - -finishing thread FetcherThread3, activeThreads=8
  179. 2013-01-12 05:38:17,416 INFO fetcher.FetcherJob - -finishing thread FetcherThread8, activeThreads=7
  180. 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread7, activeThreads=6
  181. 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread1, activeThreads=4
  182. 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread6, activeThreads=2
  183. 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread9, activeThreads=5
  184. 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread4, activeThreads=0
  185. 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread5, activeThreads=1
  186. 2013-01-12 05:38:17,417 INFO fetcher.FetcherJob - -finishing thread FetcherThread2, activeThreads=3
  187. 2013-01-12 05:38:21,908 INFO fetcher.FetcherJob - 0/0 spinwaiting/active, 6 pages, 0 errors, 0.2 0.2 pages/s, 46 22 kb/s, 0 URLs in 0 queues
  188. 2013-01-12 05:38:21,908 INFO fetcher.FetcherJob - -activeThreads=0
  189. 2013-01-12 05:38:24,882 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  190. 2013-01-12 05:38:25,683 INFO parse.ParserJob - ParserJob: resuming: false
  191. 2013-01-12 05:38:25,683 INFO parse.ParserJob - ParserJob: forced reparse: false
  192. 2013-01-12 05:38:25,683 INFO parse.ParserJob - ParserJob: parsing all
  193. 2013-01-12 05:38:25,785 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
  194. 2013-01-12 05:38:26,066 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  195. 2013-01-12 05:38:26,081 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  196. 2013-01-12 05:38:26,084 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
  197. 2013-01-12 05:38:26,102 INFO parse.ParserJob - Parsing http://localhost/
  198. 2013-01-12 05:38:26,110 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default
  199. 2013-01-12 05:38:26,116 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  200. 2013-01-12 05:38:26,116 INFO parse.ParserFactory - The parsing plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the plugin.includes system property, and all claim to support the content type application/vnd.oasis.opendocument.text, but they are not mapped to it in the parse-plugins.xml file
  201. 2013-01-12 05:38:26,251 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
  202. 2013-01-12 05:38:27,428 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf of type application/pdf
  203. 2013-01-12 05:38:27,436 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  204. 2013-01-12 05:38:27,453 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt of type application/vnd.oasis.opendocument.text
  205. 2013-01-12 05:38:27,456 INFO parse.ParserJob - Parsing http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
  206. 2013-01-12 05:38:27,502 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf of type application/pdf
  207. 2013-01-12 05:38:27,508 INFO parse.ParserJob - Parsing http://localhost/sapi/Solr-install-v2.pdf
  208. 2013-01-12 05:38:27,508 WARN parse.ParserJob - http://localhost/sapi/Solr-install-v2.pdf skipped. Content of size 395125 was truncated to 65536
  209. 2013-01-12 05:38:28,132 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  210. 2013-01-12 05:38:29,184 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  211. 2013-01-12 05:38:29,372 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  212. 2013-01-12 05:38:29,372 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
  213. 2013-01-12 05:38:29,372 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
  214. 2013-01-12 05:38:29,372 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
  215. 2013-01-12 05:38:29,447 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  216. 2013-01-12 05:38:30,337 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  217. 2013-01-12 05:38:30,466 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
  218. 2013-01-12 05:38:30,467 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
  219. 2013-01-12 05:38:30,467 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
  220. 2013-01-12 05:38:30,476 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
  221. 2013-01-12 05:38:30,508 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  222. 2013-01-12 05:38:30,563 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  223. 2013-01-12 05:38:31,203 INFO fetcher.FetcherJob - FetcherJob: threads: 10
  224. 2013-01-12 05:38:31,204 INFO fetcher.FetcherJob - FetcherJob: parsing: false
  225. 2013-01-12 05:38:31,204 INFO fetcher.FetcherJob - FetcherJob: resuming: false
  226. 2013-01-12 05:38:31,204 INFO fetcher.FetcherJob - FetcherJob : timelimit set for : -1
  227. 2013-01-12 05:38:31,205 INFO http.Http - http.proxy.host = null
  228. 2013-01-12 05:38:31,205 INFO http.Http - http.proxy.port = 8080
  229. 2013-01-12 05:38:31,205 INFO http.Http - http.timeout = 10000
  230. 2013-01-12 05:38:31,205 INFO http.Http - http.content.limit = 65536
  231. 2013-01-12 05:38:31,205 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
  232. 2013-01-12 05:38:31,205 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
  233. 2013-01-12 05:38:31,205 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  234. 2013-01-12 05:38:31,329 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  235. 2013-01-12 05:38:31,484 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  236. 2013-01-12 05:38:31,484 INFO fetcher.FetcherJob - Using queue mode : byHost
  237. 2013-01-12 05:38:31,484 INFO fetcher.FetcherJob - Fetcher: threads: 10
  238. 2013-01-12 05:38:31,505 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
  239. 2013-01-12 05:38:31,506 INFO http.Http - http.proxy.host = null
  240. 2013-01-12 05:38:31,506 INFO http.Http - http.proxy.port = 8080
  241. 2013-01-12 05:38:31,506 INFO http.Http - http.timeout = 10000
  242. 2013-01-12 05:38:31,506 INFO http.Http - http.content.limit = 65536
  243. 2013-01-12 05:38:31,506 INFO http.Http - http.agent = nutch-solr-integration-test/1.6 (BW Web Crawler using Nutch 1.6; http://billydekid.wordpress.com/; bwidyasanyata@gmail.com)
  244. 2013-01-12 05:38:31,506 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
  245. 2013-01-12 05:38:31,506 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  246. 2013-01-12 05:38:31,509 INFO fetcher.FetcherJob - Fetcher: throughput threshold: -1
  247. 2013-01-12 05:38:31,509 INFO fetcher.FetcherJob - Fetcher: throughput threshold sequence: 5
  248. 2013-01-12 05:38:31,511 INFO fetcher.FetcherJob - QueueFeeder finished: total 7 records. Hit by time limit :0
  249. 2013-01-12 05:38:36,509 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 64 64 kb/s, 6 URLs in 1 queues
  250. 2013-01-12 05:38:36,513 INFO fetcher.FetcherJob - fetching http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
  251. 2013-01-12 05:38:41,510 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 64 64 kb/s, 5 URLs in 1 queues
  252. 2013-01-12 05:38:41,520 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Solr-install-v2.pdf
  253. 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 76 102 kb/s, 4 URLs in 1 queues
  254. 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - * queue: http://localhost
  255. 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - maxThreads = 1
  256. 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - inProgress = 0
  257. 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - crawlDelay = 5000
  258. 2013-01-12 05:38:46,510 INFO fetcher.FetcherJob - minCrawlDelay = 0
  259. 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - nextFetchTime = 1357943926525
  260. 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - now = 1357943926511
  261. 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - 0. http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
  262. 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - 1. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  263. 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - 2. http://localhost/
  264. 2013-01-12 05:38:46,511 INFO fetcher.FetcherJob - 3. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  265. 2013-01-12 05:38:46,528 INFO fetcher.FetcherJob - fetching http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
  266. 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 73 64 kb/s, 3 URLs in 1 queues
  267. 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - * queue: http://localhost
  268. 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - maxThreads = 1
  269. 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - inProgress = 0
  270. 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - crawlDelay = 5000
  271. 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - minCrawlDelay = 0
  272. 2013-01-12 05:38:51,511 INFO fetcher.FetcherJob - nextFetchTime = 1357943931531
  273. 2013-01-12 05:38:51,512 INFO fetcher.FetcherJob - now = 1357943931512
  274. 2013-01-12 05:38:51,512 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  275. 2013-01-12 05:38:51,512 INFO fetcher.FetcherJob - 1. http://localhost/
  276. 2013-01-12 05:38:51,512 INFO fetcher.FetcherJob - 2. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  277. 2013-01-12 05:38:51,533 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  278. 2013-01-12 05:38:56,512 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 63 22 kb/s, 2 URLs in 1 queues
  279. 2013-01-12 05:38:56,512 INFO fetcher.FetcherJob - * queue: http://localhost
  280. 2013-01-12 05:38:56,512 INFO fetcher.FetcherJob - maxThreads = 1
  281. 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - inProgress = 0
  282. 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - crawlDelay = 5000
  283. 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - minCrawlDelay = 0
  284. 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - nextFetchTime = 1357943936535
  285. 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - now = 1357943936513
  286. 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - 0. http://localhost/
  287. 2013-01-12 05:38:56,513 INFO fetcher.FetcherJob - 1. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  288. 2013-01-12 05:38:56,538 INFO fetcher.FetcherJob - fetching http://localhost/
  289. 2013-01-12 05:39:01,513 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, 6 pages, 0 errors, 0.2 0.2 pages/s, 53 2 kb/s, 1 URLs in 1 queues
  290. 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - * queue: http://localhost
  291. 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - maxThreads = 1
  292. 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - inProgress = 0
  293. 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - crawlDelay = 5000
  294. 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - minCrawlDelay = 0
  295. 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - nextFetchTime = 1357943941546
  296. 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - now = 1357943941514
  297. 2013-01-12 05:39:01,514 INFO fetcher.FetcherJob - 0. http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  298. 2013-01-12 05:39:01,549 INFO fetcher.FetcherJob - fetching http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  299. 2013-01-12 05:39:01,552 INFO fetcher.FetcherJob - -finishing thread FetcherThread6, activeThreads=9
  300. 2013-01-12 05:39:02,023 INFO fetcher.FetcherJob - -finishing thread FetcherThread4, activeThreads=7
  301. 2013-01-12 05:39:02,023 INFO fetcher.FetcherJob - -finishing thread FetcherThread5, activeThreads=7
  302. 2013-01-12 05:39:02,024 INFO fetcher.FetcherJob - -finishing thread FetcherThread1, activeThreads=6
  303. 2013-01-12 05:39:02,024 INFO fetcher.FetcherJob - -finishing thread FetcherThread8, activeThreads=4
  304. 2013-01-12 05:39:02,024 INFO fetcher.FetcherJob - -finishing thread FetcherThread3, activeThreads=5
  305. 2013-01-12 05:39:02,029 INFO fetcher.FetcherJob - -finishing thread FetcherThread0, activeThreads=2
  306. 2013-01-12 05:39:02,029 INFO fetcher.FetcherJob - -finishing thread FetcherThread2, activeThreads=2
  307. 2013-01-12 05:39:02,030 INFO fetcher.FetcherJob - -finishing thread FetcherThread9, activeThreads=1
  308. 2013-01-12 05:39:02,030 INFO fetcher.FetcherJob - -finishing thread FetcherThread7, activeThreads=0
  309. 2013-01-12 05:39:06,515 INFO fetcher.FetcherJob - 0/0 spinwaiting/active, 7 pages, 0 errors, 0.2 0.2 pages/s, 48 22 kb/s, 0 URLs in 0 queues
  310. 2013-01-12 05:39:06,515 INFO fetcher.FetcherJob - -activeThreads=0
  311. 2013-01-12 05:39:06,965 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  312. 2013-01-12 05:39:07,300 INFO parse.ParserJob - ParserJob: resuming: false
  313. 2013-01-12 05:39:07,300 INFO parse.ParserJob - ParserJob: forced reparse: false
  314. 2013-01-12 05:39:07,300 INFO parse.ParserJob - ParserJob: parsing all
  315. 2013-01-12 05:39:07,374 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
  316. 2013-01-12 05:39:07,525 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  317. 2013-01-12 05:39:07,541 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  318. 2013-01-12 05:39:07,556 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature
  319. 2013-01-12 05:39:07,575 INFO parse.ParserJob - Parsing http://localhost/
  320. 2013-01-12 05:39:07,583 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default
  321. 2013-01-12 05:39:07,594 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
  322. 2013-01-12 05:39:07,594 INFO parse.ParserFactory - The parsing plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the plugin.includes system property, and all claim to support the content type application/vnd.oasis.opendocument.text, but they are not mapped to it in the parse-plugins.xml file
  323. 2013-01-12 05:39:07,704 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt of type application/vnd.oasis.opendocument.text
  324. 2013-01-12 05:39:07,708 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
  325. 2013-01-12 05:39:07,763 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf of type application/pdf
  326. 2013-01-12 05:39:07,770 INFO parse.ParserJob - Parsing http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
  327. 2013-01-12 05:39:07,797 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt of type application/vnd.oasis.opendocument.text
  328. 2013-01-12 05:39:07,801 INFO parse.ParserJob - Parsing http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
  329. 2013-01-12 05:39:07,873 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf of type application/pdf
  330. 2013-01-12 05:39:07,880 INFO parse.ParserJob - Parsing http://localhost/sapi/Solr-install-v2.pdf
  331. 2013-01-12 05:39:07,880 WARN parse.ParserJob - http://localhost/sapi/Solr-install-v2.pdf skipped. Content of size 395125 was truncated to 65536
  332. 2013-01-12 05:39:07,881 INFO parse.ParserJob - Parsing http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
  333. 2013-01-12 05:39:07,931 WARN parse.ParseUtil - Unable to successfully parse content http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf of type application/pdf
  334. 2013-01-12 05:39:08,062 WARN mapred.FileOutputCommitter - Output path is null in cleanup
  335. 2013-01-12 05:39:08,634 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000
  336. 2013-01-12 05:39:09,469 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000
  337. 2013-01-12 05:39:09,469 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
  338. 2013-01-12 05:39:09,469 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
  339. 2013-01-12 05:39:09,469 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
  340. 2013-01-12 05:39:09,731 WARN mapred.FileOutputCommitter - Output path is null in cleanup
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement