Share Pastebin
Guest
Public paste!

Log

By: a guest | Mar 16th, 2010 | Syntax: None | Size: 6.64 KB | Hits: 148 | Expires: Never
Copy text to clipboard
  1. crawl started in: crawl
  2. rootUrlDir = urls
  3. threads = 10
  4. depth = 3
  5. topN = 2
  6. Injector: starting
  7. Injector: crawlDb: crawl/crawldb
  8. Injector: urlDir: urls
  9. Injector: Converting injected urls to crawl db entries.
  10. Injector: Merging injected urls into crawl db.
  11. Injector: done
  12. Generator: Selecting best-scoring urls due for fetch.
  13. Generator: starting
  14. Generator: segment: crawl/segments/20100317120411
  15. Generator: filtering: true
  16. Generator: topN: 2
  17. Generator: jobtracker is 'local', generating exactly one partition.
  18. Generator: Partitioning selected urls by host, for politeness.
  19. Generator: done.
  20. Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
  21. Fetcher: starting
  22. Fetcher: segment: crawl/segments/20100317120411
  23. Fetcher: threads: 10
  24. QueueFeeder finished: total 1 records.
  25. fetching http://thestar.com.my/
  26. -finishing thread FetcherThread, activeThreads=9
  27. -finishing thread FetcherThread, activeThreads=8
  28. -finishing thread FetcherThread, activeThreads=7
  29. -finishing thread FetcherThread, activeThreads=6
  30. -finishing thread FetcherThread, activeThreads=5
  31. -finishing thread FetcherThread, activeThreads=4
  32. -finishing thread FetcherThread, activeThreads=3
  33. -finishing thread FetcherThread, activeThreads=2
  34. -finishing thread FetcherThread, activeThreads=1
  35. -finishing thread FetcherThread, activeThreads=0
  36. -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
  37. -activeThreads=0
  38. Fetcher: done
  39. CrawlDb update: starting
  40. CrawlDb update: db: crawl/crawldb
  41. CrawlDb update: segments: [crawl/segments/20100317120411]
  42. CrawlDb update: additions allowed: true
  43. CrawlDb update: URL normalizing: true
  44. CrawlDb update: URL filtering: true
  45. CrawlDb update: Merging segment data into db.
  46. CrawlDb update: done
  47. Generator: Selecting best-scoring urls due for fetch.
  48. Generator: starting
  49. Generator: segment: crawl/segments/20100317120421
  50. Generator: filtering: true
  51. Generator: topN: 2
  52. Generator: jobtracker is 'local', generating exactly one partition.
  53. Generator: Partitioning selected urls by host, for politeness.
  54. Generator: done.
  55. Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
  56. Fetcher: starting
  57. Fetcher: segment: crawl/segments/20100317120421
  58. Fetcher: threads: 10
  59. QueueFeeder finished: total 2 records.
  60. fetching http://thestar.com.my/news/story.asp?file=/2010/3/17/nation/5873356&sec=nation
  61. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  62. * queue: http://203.115.194.20
  63.   maxThreads    = 1
  64.   inProgress    = 0
  65.   crawlDelay    = 1000
  66.   minCrawlDelay = 0
  67.   nextFetchTime = 1268798666076
  68.   now           = 1268798665960
  69.   0. http://thestar.com.my/news/story.asp?file=/2010/3/17/nation/20100317104521&sec=nation
  70. fetching http://thestar.com.my/news/story.asp?file=/2010/3/17/nation/20100317104521&sec=nation
  71. -finishing thread FetcherThread, activeThreads=9
  72. -finishing thread FetcherThread, activeThreads=7
  73. -finishing thread FetcherThread, activeThreads=7
  74. -finishing thread FetcherThread, activeThreads=1
  75. -finishing thread FetcherThread, activeThreads=2
  76. -finishing thread FetcherThread, activeThreads=3
  77. -finishing thread FetcherThread, activeThreads=4
  78. -finishing thread FetcherThread, activeThreads=5
  79. -finishing thread FetcherThread, activeThreads=6
  80. -finishing thread FetcherThread, activeThreads=0
  81. -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
  82. -activeThreads=0
  83. Fetcher: done
  84. CrawlDb update: starting
  85. CrawlDb update: db: crawl/crawldb
  86. CrawlDb update: segments: [crawl/segments/20100317120421]
  87. CrawlDb update: additions allowed: true
  88. CrawlDb update: URL normalizing: true
  89. CrawlDb update: URL filtering: true
  90. CrawlDb update: Merging segment data into db.
  91. CrawlDb update: done
  92. Generator: Selecting best-scoring urls due for fetch.
  93. Generator: starting
  94. Generator: segment: crawl/segments/20100317120431
  95. Generator: filtering: true
  96. Generator: topN: 2
  97. Generator: jobtracker is 'local', generating exactly one partition.
  98. Generator: Partitioning selected urls by host, for politeness.
  99. Generator: done.
  100. Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
  101. Fetcher: starting
  102. Fetcher: segment: crawl/segments/20100317120431
  103. Fetcher: threads: 10
  104. QueueFeeder finished: total 2 records.
  105. fetching http://thestar.com.my/news/story.asp?file=/2010/3/17/nation/5878747&sec=nation
  106. -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
  107. * queue: http://203.115.194.20
  108.   maxThreads    = 1
  109.   inProgress    = 0
  110.   crawlDelay    = 1000
  111.   minCrawlDelay = 0
  112.   nextFetchTime = 1268798676585
  113.   now           = 1268798676215
  114.   0. http://thestar.com.my/news/nation/
  115. fetching http://thestar.com.my/news/nation/
  116. -finishing thread FetcherThread, activeThreads=8
  117. -finishing thread FetcherThread, activeThreads=8
  118. -finishing thread FetcherThread, activeThreads=1
  119. -finishing thread FetcherThread, activeThreads=2
  120. -finishing thread FetcherThread, activeThreads=3
  121. -finishing thread FetcherThread, activeThreads=4
  122. -finishing thread FetcherThread, activeThreads=5
  123. -finishing thread FetcherThread, activeThreads=6
  124. -finishing thread FetcherThread, activeThreads=7
  125. -finishing thread FetcherThread, activeThreads=0
  126. -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
  127. -activeThreads=0
  128. Fetcher: done
  129. CrawlDb update: starting
  130. CrawlDb update: db: crawl/crawldb
  131. CrawlDb update: segments: [crawl/segments/20100317120431]
  132. CrawlDb update: additions allowed: true
  133. CrawlDb update: URL normalizing: true
  134. CrawlDb update: URL filtering: true
  135. CrawlDb update: Merging segment data into db.
  136. CrawlDb update: done
  137. LinkDb: starting
  138. LinkDb: linkdb: crawl/linkdb
  139. LinkDb: URL normalize: true
  140. LinkDb: URL filter: true
  141. LinkDb: adding segment: file:/C:/nutch-1.0/crawl/segments/20100317120411
  142. LinkDb: adding segment: file:/C:/nutch-1.0/crawl/segments/20100317120421
  143. LinkDb: adding segment: file:/C:/nutch-1.0/crawl/segments/20100317120431
  144. LinkDb: done
  145. Indexer: starting
  146. Indexer: done
  147. Dedup: starting
  148. Dedup: adding indexes in: crawl/indexes
  149. Dedup: done
  150. merging indexes to: crawl/index
  151. Adding file:/C:/nutch-1.0/crawl/indexes/part-00000
  152. done merging
  153. crawl finished: crawl
  154. Found 1 hits
  155.  
  156. Html content:
  157. <ul>
  158. <li>boost = 0.14707822</li>
  159. <li>digest = 02930a5240bf62309821cfec88b819b3</li>
  160. <li>segment = 20100317120431</li>
  161. <li>title = The Star Online: Nation</li>
  162. <li>tstamp = 20100317040436985</li>
  163. <li>url = http://thestar.com.my/news/nation/</li>
  164. </ul>
  165.  
  166. Created html file
  167. Start open calais web service.....
  168. End open calais web service.....
  169. Title is: The Star Online: Nation
  170. (http://thestar.com.my/news/nation/)
  171. Date Fetched: Wed Mar 17 12:04:36 SGT 2010
  172.  ... promise, UN sec-gen tell rich countries Rich nations have not kept their promises ...
  173.  
  174. ----------------------------------------