Advertisement
Guest User

Untitled

a guest
Oct 25th, 2017
138
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 16.61 KB | None | 0 0
  1. [hbase@sandbox bin]$ echo 'https://docs.oracle.com/javase/7/docs/api/' > seeds.txt
  2.  
  3. [hbase@sandbox bin]$ ./nutch inject seeds.txt
  4.  
  5. InjectorJob: starting at 2017-10-25 09:25:28
  6.  
  7. InjectorJob: Injecting urlDir: seeds.txt
  8.  
  9. InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
  10.  
  11. InjectorJob: total number of urls rejected by filters: 0
  12.  
  13. InjectorJob: total number of urls injected after normalization and filtering: 1
  14.  
  15. Injector: finished at 2017-10-25 09:25:36, elapsed: 00:00:07
  16.  
  17. [hbase@sandbox bin]$ ./nutch generate -topN 100
  18.  
  19. GeneratorJob: starting at 2017-10-25 09:26:05
  20.  
  21. GeneratorJob: Selecting best-scoring urls due for fetch.
  22.  
  23. GeneratorJob: starting
  24.  
  25. GeneratorJob: filtering: true
  26.  
  27. GeneratorJob: normalizing: true
  28.  
  29. GeneratorJob: topN: 100
  30.  
  31. GeneratorJob: finished at 2017-10-25 09:26:26, time elapsed: 00:00:21
  32.  
  33. GeneratorJob: generated batch id: 1508923565-2134519272 containing 1 URLs
  34.  
  35. [hbase@sandbox bin]$ ./nutch fetch -all
  36.  
  37. FetcherJob: starting at 2017-10-25 09:26:40
  38.  
  39. FetcherJob: fetching all
  40.  
  41. FetcherJob: threads: 10
  42.  
  43. FetcherJob: parsing: false
  44.  
  45. FetcherJob: resuming: false
  46.  
  47. FetcherJob : timelimit set for : -1
  48.  
  49. Using queue mode : byHost
  50.  
  51. Fetcher: threads: 10
  52.  
  53. QueueFeeder finished: total 1 records. Hit by time limit :0
  54.  
  55. Fetcher: throughput threshold: -1
  56.  
  57. Fetcher: throughput threshold sequence: 5
  58.  
  59. -finishing thread FetcherThread1, activeThreads=9
  60.  
  61. -finishing thread FetcherThread7, activeThreads=8
  62.  
  63. -finishing thread FetcherThread6, activeThreads=7
  64.  
  65. -finishing thread FetcherThread5, activeThreads=6
  66.  
  67. -finishing thread FetcherThread2, activeThreads=5
  68.  
  69. -finishing thread FetcherThread4, activeThreads=4
  70.  
  71. -finishing thread FetcherThread3, activeThreads=3
  72.  
  73. fetching https://docs.oracle.com/javase/7/docs/api/ (queue crawl delay=5000ms)
  74.  
  75. -finishing thread FetcherThread8, activeThreads=2
  76.  
  77. -finishing thread FetcherThread9, activeThreads=1
  78.  
  79. 0/1 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 1 queues
  80.  
  81. -finishing thread FetcherThread0, activeThreads=0
  82.  
  83. 0/0 spinwaiting/active, 1 pages, 0 errors, 0.1 0 pages/s, 2 4 kb/s, 0 URLs in 0 queues
  84.  
  85. -activeThreads=0
  86.  
  87. Using queue mode : byHost
  88.  
  89. Fetcher: threads: 10
  90.  
  91. QueueFeeder finished: total 0 records. Hit by time limit :0
  92.  
  93. -finishing thread FetcherThread0, activeThreads=0
  94.  
  95. Fetcher: throughput threshold: -1
  96.  
  97. Fetcher: throughput threshold sequence: 5
  98.  
  99. -finishing thread FetcherThread1, activeThreads=7
  100.  
  101. -finishing thread FetcherThread2, activeThreads=6
  102.  
  103. -finishing thread FetcherThread3, activeThreads=5
  104.  
  105. -finishing thread FetcherThread4, activeThreads=4
  106.  
  107. -finishing thread FetcherThread5, activeThreads=3
  108.  
  109. -finishing thread FetcherThread6, activeThreads=2
  110.  
  111. -finishing thread FetcherThread7, activeThreads=1
  112.  
  113. -finishing thread FetcherThread8, activeThreads=0
  114.  
  115. -finishing thread FetcherThread9, activeThreads=0
  116.  
  117. 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues
  118.  
  119. -activeThreads=0
  120.  
  121. FetcherJob: finished at 2017-10-25 09:27:06, time elapsed: 00:00:25
  122.  
  123. [hbase@sandbox bin]$ hbase shell
  124.  
  125. 2017-10-25 09:27:27,494 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
  126.  
  127. HBase Shell; enter 'help<RETURN>' for list of supported commands.
  128.  
  129. Type "exit<RETURN>" to leave the HBase Shell
  130.  
  131. Version 0.98.4.2.2.4.2-2-hadoop2, rdd8a499345afc1ac49dc5ef212ba64b23abfe110, Tue Mar 31 16:18:12 EDT 2015
  132.  
  133.  
  134.  
  135. hbase(main):001:0> list
  136.  
  137. TABLE
  138.  
  139. SLF4J: Class path contains multiple SLF4J bindings.
  140.  
  141. SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  142.  
  143. SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  144.  
  145. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  146.  
  147. scan weiemployee
  148.  
  149. webpage
  150.  
  151. 2 row(s) in 4.6180 seconds
  152.  
  153.  
  154.  
  155. => ["iemployee", "webpage"]
  156.  
  157. hbase(main):002:0> scan 'webpage'
  158.  
  159. ROW COLUMN+CELL
  160.  
  161. com.oracle.docs:https/javase/7/docs/api/ column=f:bas, timestamp=1508923620099, value=https://docs.oracle.com/javase/7/docs/api/
  162.  
  163. com.oracle.docs:https/javase/7/docs/api/ column=f:bid, timestamp=1508923585567, value=1508923565-2134519272
  164.  
  165. com.oracle.docs:https/javase/7/docs/api/ column=f:cnt, timestamp=1508923620099, value=<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/T
  166.  
  167. R/html4/frameset.dtd">\x0A<!-- NewPage -->\x0A<html lang="en">\x0A<head>\x0A<!-- Generated by javadoc on Mon Oct 09 00:19:07
  168.  
  169. PDT 2017 -->\x0A<title>Java Platform SE 7 </title>\x0A<script type="text/javascript">\x0A tmpTargetPage = "" + window.lo
  170.  
  171. cation.search;\x0A if (tmpTargetPage != "" && tmpTargetPage != "undefined")\x0A tmpTargetPage = tmpTargetPage.subs
  172.  
  173. tring(1);\x0A if (tmpTargetPage.indexOf(":") != -1 || (tmpTargetPage != "" && !validURL(tmpTargetPage)))\x0A tmpTa
  174.  
  175. rgetPage = "undefined";\x0A targetPage = tmpTargetPage;\x0A function validURL(url) {\x0A try {\x0A u
  176.  
  177. rl = decodeURIComponent(url);\x0A }\x0A catch (error) {\x0A return false;\x0A }\x0A v
  178.  
  179. ar pos = url.indexOf(".html");\x0A if (pos == -1 || pos != url.length - 5)\x0A return false;\x0A va
  180.  
  181. r allowNumber = false;\x0A var allowSep = false;\x0A var seenDot = false;\x0A for (var i = 0; i < url.l
  182.  
  183. ength - 5; i++) {\x0A var ch = url.charAt(i);\x0A if ('a' <= ch && ch <= 'z' ||\x0A
  184.  
  185. 'A' <= ch && ch <= 'Z' ||\x0A ch == '$' ||\x0A ch == '_' ||\x0A ch
  186.  
  187. .charCodeAt(0) > 127) {\x0A allowNumber = true;\x0A allowSep = true;\x0A } else if
  188.  
  189. ('0' <= ch && ch <= '9'\x0A || ch == '-') {\x0A if (!allowNumber)\x0A
  190.  
  191. return false;\x0A } else if (ch == '/' || ch == '.') {\x0A if (!allowSep)\x0A r
  192.  
  193. eturn false;\x0A allowNumber = false;\x0A allowSep = false;\x0A if (ch == '.')\
  194.  
  195. x0A seenDot = true;\x0A if (ch == '/' && seenDot)\x0A return false;\x
  196.  
  197. 0A } else {\x0A return false;\x0A }\x0A }\x0A return true;\x0A }\x0A
  198.  
  199. function loadFrames() {\x0A if (targetPage != "" && targetPage != "undefined")\x0A top.classFrame.locat
  200.  
  201. ion = top.targetPage;\x0A }\x0A</script>\x0A</head>\x0A<frameset cols="20%,80%" title="Documentation frame" onload="top.l
  202.  
  203. oadFrames()">\x0A<frameset rows="30%,70%" title="Left frames" onload="top.loadFrames()">\x0A<frame src="overview-frame.html"
  204.  
  205. name="packageListFrame" title="All Packages">\x0A<frame src="allclasses-frame.html" name="packageFrame" title="All classes
  206.  
  207. and interfaces (except non-static nested types)">\x0A</frameset>\x0A<frame src="overview-summary.html" name="classFrame" tit
  208.  
  209. le="Package, class and interface descriptions" scrolling="yes">\x0A<noframes>\x0A<noscript>\x0A<div>JavaScript is disabled o
  210.  
  211. n your browser.</div>\x0A</noscript>\x0A<h2>Frame Alert</h2>\x0A<p>This document is designed to be viewed using the frames f
  212.  
  213. eature. If you see this message, you are using a non-frame-capable web client. Link to <a href="overview-summary.html">Non-f
  214.  
  215. rame version</a>.</p>\x0A</noframes>\x0A</frameset>\x0A</html>\x0A
  216.  
  217. com.oracle.docs:https/javase/7/docs/api/ column=f:fi, timestamp=1508923536028, value=\x00'\x8D\x00
  218.  
  219. com.oracle.docs:https/javase/7/docs/api/ column=f:prot, timestamp=1508923620099, value=\x02\x00\x00
  220.  
  221. com.oracle.docs:https/javase/7/docs/api/ column=f:pts, timestamp=1508923620099, value=\x00\x00\x01_R\xD9\xD4\xB1
  222.  
  223. com.oracle.docs:https/javase/7/docs/api/ column=f:st, timestamp=1508923620099, value=\x00\x00\x00\x02
  224.  
  225. com.oracle.docs:https/javase/7/docs/api/ column=f:ts, timestamp=1508923620099, value=\x00\x00\x01_R\xDB5D
  226.  
  227. com.oracle.docs:https/javase/7/docs/api/ column=f:typ, timestamp=1508923620099, value=text/html
  228.  
  229. com.oracle.docs:https/javase/7/docs/api/ column=h:Accept-Ranges, timestamp=1508923620099, value=bytes
  230.  
  231. com.oracle.docs:https/javase/7/docs/api/ column=h:Connection, timestamp=1508923620099, value=close
  232.  
  233. com.oracle.docs:https/javase/7/docs/api/ column=h:Content-Encoding, timestamp=1508923620099, value=gzip
  234.  
  235. com.oracle.docs:https/javase/7/docs/api/ column=h:Content-Length, timestamp=1508923620099, value=1083
  236.  
  237. com.oracle.docs:https/javase/7/docs/api/ column=h:Content-Type, timestamp=1508923620099, value=text/html
  238.  
  239. com.oracle.docs:https/javase/7/docs/api/ column=h:Date, timestamp=1508923620099, value=Wed, 25 Oct 2017 09:26:58 GMT
  240.  
  241. com.oracle.docs:https/javase/7/docs/api/ column=h:ETag, timestamp=1508923620099, value="e6133d8aad7082b3c3290041f83cc357:1508252293"
  242.  
  243. com.oracle.docs:https/javase/7/docs/api/ column=h:Last-Modified, timestamp=1508923620099, value=Thu, 12 Oct 2017 04:20:15 GMT
  244.  
  245. com.oracle.docs:https/javase/7/docs/api/ column=h:Server, timestamp=1508923620099, value=Apache
  246.  
  247. com.oracle.docs:https/javase/7/docs/api/ column=h:Vary, timestamp=1508923620099, value=Accept-Encoding
  248.  
  249. com.oracle.docs:https/javase/7/docs/api/ column=mk:_ftcmrk_, timestamp=1508923620099, value=1508923565-2134519272
  250.  
  251. com.oracle.docs:https/javase/7/docs/api/ column=mk:_gnmrk_, timestamp=1508923620099, value=1508923565-2134519272
  252.  
  253. com.oracle.docs:https/javase/7/docs/api/ column=mk:_injmrk_, timestamp=1508923620099, value=y
  254.  
  255. com.oracle.docs:https/javase/7/docs/api/ column=mk:dist, timestamp=1508923620099, value=0
  256.  
  257. com.oracle.docs:https/javase/7/docs/api/ column=mtdt:_rs_, timestamp=1508923620099, value=\x00\x00\x06L
  258.  
  259. com.oracle.docs:https/javase/7/docs/api/ column=s:s, timestamp=1508923536028, value=?\x80\x00\x00
  260.  
  261. 1 row(s) in 0.4910 seconds
  262.  
  263.  
  264.  
  265. hbase(main):003:0> exit
  266.  
  267. [hbase@sandbox bin]$ ./nutch parse -all
  268.  
  269. ParserJob: starting at 2017-10-25 09:28:14
  270.  
  271. ParserJob: resuming:false
  272.  
  273. ParserJob: forced reparse:false
  274.  
  275. ParserJob: parsing all
  276.  
  277. Parsing https://docs.oracle.com/javase/7/docs/api/
  278.  
  279. ParserJob: success
  280.  
  281. ParserJob: finished at 2017-10-25 09:28:32, time elapsed: 00:00:17
  282.  
  283. [hbase@sandbox bin]$ ./nutch updatingdb -all
  284.  
  285. Error: Could not find or load main class updatingdb
  286.  
  287. [hbase@sandbox bin]$ ./nutch solrindex http://127.0.0.1:8983/solr/#/collection1 -all
  288.  
  289. IndexingJob: starting
  290.  
  291. Active IndexWriters :
  292.  
  293. SOLRIndexWriter
  294.  
  295. solr.server.url : URL of the SOLR instance (mandatory)
  296.  
  297. solr.commit.size : buffer size when sending to SOLR (default 1000)
  298.  
  299. solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
  300.  
  301. solr.auth : use authentication (default false)
  302.  
  303. solr.auth.username : username for authentication
  304.  
  305. solr.auth.password : password for authentication
  306.  
  307.  
  308.  
  309.  
  310.  
  311. SolrIndexerJob: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected content type application/octet-stream but got text/html;charset=ISO-8859-1. <html>
  312.  
  313. <head>
  314.  
  315. <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
  316.  
  317. <title>Error 405 HTTP method POST is not supported by this URL</title>
  318.  
  319. </head>
  320.  
  321. <body><h2>HTTP ERROR 405</h2>
  322.  
  323. <p>Problem accessing /solr/admin.html. Reason:
  324.  
  325. <pre> HTTP method POST is not supported by this URL</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>
  326.  
  327. <br/>
  328.  
  329. <br/>
  330.  
  331. <br/>
  332.  
  333. <br/>
  334.  
  335. <br/>
  336.  
  337. <br/>
  338.  
  339. <br/>
  340.  
  341. <br/>
  342.  
  343. <br/>
  344.  
  345. <br/>
  346.  
  347. <br/>
  348.  
  349. <br/>
  350.  
  351. <br/>
  352.  
  353. <br/>
  354.  
  355. <br/>
  356.  
  357. <br/>
  358.  
  359. <br/>
  360.  
  361. <br/>
  362.  
  363. <br/>
  364.  
  365.  
  366.  
  367. </body>
  368.  
  369. </html>
  370.  
  371.  
  372.  
  373. at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:455)
  374.  
  375. at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
  376.  
  377. at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
  378.  
  379. at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
  380.  
  381. at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
  382.  
  383. at org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:146)
  384.  
  385. at org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:124)
  386.  
  387. at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:186)
  388.  
  389. at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:202)
  390.  
  391. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  392.  
  393. at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:211)
  394.  
  395.  
  396.  
  397. [hbase@sandbox bin]$
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement