Advertisement
Guest User

Untitled

a guest
Oct 18th, 2016
136
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Scala 57.96 KB | None | 0 0
  1. Last login: Tue Oct 18 09:22:09 on ttys001
  2. fireelemental:ApacheZeppelin konradm$ ll
  3. total 144
  4. -rw-r--r--   1 konradm  KAINOS\Domain Users    15K Sep 13 15:40 LICENSE
  5. -rw-r--r--   1 konradm  KAINOS\Domain Users   253B Sep 13 15:40 NOTICE
  6. -rw-r--r--   1 konradm  KAINOS\Domain Users   9.1K Sep 13 15:40 README.md
  7. -rw-r--r--   1 konradm  KAINOS\Domain Users   115B Sep 13 15:40 Roadmap.md
  8. -rw-r--r--   1 konradm  KAINOS\Domain Users   2.7K Sep 13 15:40 SECURITY-README.md
  9. -rw-r--r--   1 konradm  KAINOS\Domain Users   2.8K Sep 13 15:40 STYLE.md
  10. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 15:40 _tools
  11. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:32 alluxio
  12. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:29 angular
  13. drwxr-xr-x   6 konradm  KAINOS\Domain Users   204B Sep 13 16:32 bigquery
  14. drwxr-xr-x  12 konradm  KAINOS\Domain Users   408B Sep 13 15:40 bin
  15. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:31 cassandra
  16. drwxr-xr-x  15 konradm  KAINOS\Domain Users   510B Sep 21 14:36 conf
  17. drwxr-xr-x   9 konradm  KAINOS\Domain Users   306B Sep 13 15:40 dev
  18. drwxr-xr-x  30 konradm  KAINOS\Domain Users   1.0K Sep 13 15:40 docs
  19. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:32 elasticsearch
  20. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:30 file
  21. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:30 flink
  22. drwxr-xr-x   4 konradm  KAINOS\Domain Users   136B Sep 13 15:44 geode
  23. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:29 hbase
  24. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:30 ignite
  25. drwxr-xr-x  20 root     KAINOS\Domain Users   680B Sep 13 16:32 interpreter
  26. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:30 jdbc
  27. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:30 kylin
  28. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:30 lens
  29. drwxr-xr-x  14 konradm  KAINOS\Domain Users   476B Sep 13 15:40 licenses
  30. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:29 livy
  31. drwxr-xr-x   4 konradm  KAINOS\Domain Users   136B Sep 28 19:25 local-repo
  32. drwxr-xr-x  47 konradm  KAINOS\Domain Users   1.6K Oct  4 21:41 logs
  33. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:29 markdown
  34. drwxr-xr-x   9 konradm  KAINOS\Domain Users   306B Oct  4 21:40 notebook
  35. -rw-r--r--   1 konradm  KAINOS\Domain Users    26K Sep 13 15:44 pom.xml
  36. drwxr-xr-x   6 konradm  KAINOS\Domain Users   204B Sep 13 16:29 postgresql
  37. drwxr-xr-x   6 konradm  KAINOS\Domain Users   204B Sep 13 16:30 python
  38. drwxr-xr-x   6 konradm  KAINOS\Domain Users   204B Sep 13 15:44 r
  39. drwxr-xr-x   2 konradm  KAINOS\Domain Users    68B Oct  4 23:31 run
  40. drwxr-xr-x   4 konradm  KAINOS\Domain Users   136B Sep 13 15:44 scalding
  41. drwxr-xr-x   4 konradm  KAINOS\Domain Users   136B Sep 13 15:40 scripts
  42. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:29 shell
  43. drwxr-xr-x   6 konradm  KAINOS\Domain Users   204B Sep 13 16:29 spark
  44. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:28 spark-dependencies
  45. drwxr-xr-x   7 root     KAINOS\Domain Users   238B Sep 13 16:26 target
  46. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 15:40 testing
  47. drwxr-xr-x   3 konradm  KAINOS\Domain Users   102B Sep 13 15:47 var
  48. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:26 zeppelin-display
  49. drwxr-xr-x   7 konradm  KAINOS\Domain Users   238B Sep 13 16:36 zeppelin-distribution
  50. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 15:44 zeppelin-examples
  51. drwxr-xr-x   6 konradm  KAINOS\Domain Users   204B Sep 13 16:26 zeppelin-interpreter
  52. drwxr-xr-x   5 konradm  KAINOS\Domain Users   170B Sep 13 16:35 zeppelin-server
  53. drwxr-xr-x  22 konradm  KAINOS\Domain Users   748B Sep 13 16:35 zeppelin-web
  54. drwxr-xr-x   7 konradm  KAINOS\Domain Users   238B Sep 13 16:26 zeppelin-zengine
  55. fireelemental:ApacheZeppelin konradm$ cd shell/
  56. fireelemental:shell konradm$ ll
  57. total 16
  58. -rw-r--r--   1 konradm  KAINOS\Domain Users   4.1K Sep 13 15:44 pom.xml
  59. drwxr-xr-x   4 konradm  KAINOS\Domain Users   136B Sep 13 15:40 src
  60. drwxr-xr-x  15 root     KAINOS\Domain Users   510B Sep 13 16:29 target
  61. fireelemental:shell konradm$ cd src/
  62. fireelemental:src konradm$ ll
  63. total 0
  64. drwxr-xr-x  4 konradm  KAINOS\Domain Users   136B Sep 13 15:40 main
  65. drwxr-xr-x  3 konradm  KAINOS\Domain Users   102B Sep 13 15:40 test
  66. fireelemental:src konradm$ cd main/\
  67. >
  68. fireelemental:src konradm$ cd main/
  69. fireelemental:main konradm$ ll
  70. total 0
  71. drwxr-xr-x  3 konradm  KAINOS\Domain Users   102B Sep 13 15:40 java
  72. drwxr-xr-x  3 konradm  KAINOS\Domain Users   102B Sep 13 15:40 resources
  73. fireelemental:main konradm$ cd java
  74. fireelemental:java konradm$ ll
  75. total 0
  76. drwxr-xr-x  3 konradm  KAINOS\Domain Users   102B Sep 13 15:40 org
  77. fireelemental:java konradm$ cd org/apache/zeppelin/shell/
  78. fireelemental:shell konradm$ ll
  79. total 16
  80. -rw-r--r--  1 konradm  KAINOS\Domain Users   5.2K Sep 13 15:40 ShellInterpreter.java
  81. drwxr-xr-x  3 konradm  KAINOS\Domain Users   102B Sep 13 15:40 security
  82. fireelemental:shell konradm$ cd $ZEPPELIN
  83. $ZEPPELIN       $ZEPPELIN_HOME  
  84. fireelemental:shell konradm$ cd $ZEPPELIN
  85. $ZEPPELIN       $ZEPPELIN_HOME  
  86. fireelemental:shell konradm$ cd $ZEPPELIN
  87. -bash: cd: /Users/konradm/ApacheZeppelin/bin/zeppelin-daemon.sh: Not a directory
  88. fireelemental:shell konradm$ cd
  89. fireelemental:~ konradm$ cd ApacheZeppelin/bin
  90. fireelemental:bin konradm$ ll
  91. total 112
  92. -rw-r--r--  1 konradm  KAINOS\Domain Users   2.8K Sep 13 15:40 common.cmd
  93. -rw-r--r--  1 konradm  KAINOS\Domain Users   4.0K Sep 13 15:40 common.sh
  94. -rw-r--r--  1 konradm  KAINOS\Domain Users   1.2K Sep 13 15:40 functions.cmd
  95. -rw-r--r--  1 konradm  KAINOS\Domain Users   4.0K Sep 13 15:40 functions.sh
  96. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   1.7K Sep 13 15:40 install-interpreter.sh
  97. -rw-r--r--  1 konradm  KAINOS\Domain Users   5.0K Sep 13 15:40 interpreter.cmd
  98. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   6.0K Sep 13 15:40 interpreter.sh
  99. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   6.6K Sep 13 15:40 zeppelin-daemon.sh
  100. -rw-r--r--  1 konradm  KAINOS\Domain Users   3.1K Sep 13 15:40 zeppelin.cmd
  101. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   2.9K Sep 13 15:40 zeppelin.sh
  102. fireelemental:bin konradm$ more zeppelin-daemon.sh
  103. #!/bin/bash
  104. #
  105. # Licensed to the Apache Software Foundation (ASF) under one
  106. # or more contributor license agreements.  See the NOTICE file
  107. # distributed with this work for additional information
  108. # regarding copyright ownership.  The ASF licenses this file
  109. # to you under the Apache License, Version 2.0 (the
  110. # "License"); you may not use this file except in compliance
  111. # with the License.  You may obtain a copy of the License at
  112. #
  113. #     http://www.apache.org/licenses/LICENSE-2.0
  114. #
  115. # Unless required by applicable law or agreed to in writing, software
  116. # distributed under the License is distributed on an "AS IS" BASIS,
  117. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  118. # See the License for the specific language governing permissions and
  119. # limitations under the License.
  120. #
  121. # description: Start and stop daemon script for.
  122. #
  123.  
  124. USAGE="-e Usage: zeppelin-daemon.sh\n\t
  125.        [--config <conf-dir>] {start|stop|upstart|restart|reload|status}\n\t
  126.        [--version | -v]"
  127.  
  128. if [[ "$1" == "--config" ]]; then
  129.   shift
  130.   conf_dir="$1"
  131.   if [[ ! -d "${conf_dir}" ]]; then
  132.     echo "ERROR : ${conf_dir} is not a directory"
  133.     echo ${USAGE}
  134.     exit 1
  135.   else
  136.     export ZEPPELIN_CONF_DIR="${conf_dir}"
  137.   fi
  138.   shift
  139. fi
  140.  
  141. if [ -L ${BASH_SOURCE-$0} ]; then
  142.   BIN=$(dirname $(readlink "${BASH_SOURCE-$0}"))
  143. else
  144.   BIN=$(dirname ${BASH_SOURCE-$0})
  145. fi
  146. BIN=$(cd "${BIN}">/dev/null; pwd)
  147.  
  148. . "${BIN}/common.sh"
  149. . "${BIN}/functions.sh"
  150.  
  151. HOSTNAME=$(hostname)
  152. ZEPPELIN_NAME="Zeppelin"
  153. ZEPPELIN_LOGFILE="${ZEPPELIN_LOG_DIR}/zeppelin-${ZEPPELIN_IDENT_STRING}-${HOSTNAME}.log"
  154. ZEPPELIN_OUTFILE="${ZEPPELIN_LOG_DIR}/zeppelin-${ZEPPELIN_IDENT_STRING}-${HOSTNAME}.out"
  155. ZEPPELIN_PID="${ZEPPELIN_PID_DIR}/zeppelin-${ZEPPELIN_IDENT_STRING}-${HOSTNAME}.pid"
  156. ZEPPELIN_MAIN=org.apache.zeppelin.server.ZeppelinServer
  157. JAVA_OPTS+=" -Dzeppelin.log.file=${ZEPPELIN_LOGFILE}"
  158.  
  159. # construct classpath
  160. if [[ -d "${ZEPPELIN_HOME}/zeppelin-interpreter/target/classes" ]]; then
  161.   ZEPPELIN_CLASSPATH+=":${ZEPPELIN_HOME}/zeppelin-interpreter/target/classes"
  162. fi
  163.  
  164. if [[ -d "${ZEPPELIN_HOME}/zeppelin-zengine/target/classes" ]]; then
  165.   ZEPPELIN_CLASSPATH+=":${ZEPPELIN_HOME}/zeppelin-zengine/target/classes"
  166. fi
  167.  
  168. if [[ -d "${ZEPPELIN_HOME}/zeppelin-server/target/classes" ]]; then
  169.   ZEPPELIN_CLASSPATH+=":${ZEPPELIN_HOME}/zeppelin-server/target/classes"
  170. fi
  171.  
  172. # Add jdbc connector jar
  173. # ZEPPELIN_CLASSPATH+=":${ZEPPELIN_HOME}/jdbc/jars/jdbc-connector-jar"
  174.  
  175. addJarInDir "${ZEPPELIN_HOME}"
  176. addJarInDir "${ZEPPELIN_HOME}/lib"
  177. addJarInDir "${ZEPPELIN_HOME}/zeppelin-interpreter/target/lib"
  178. addJarInDir "${ZEPPELIN_HOME}/zeppelin-zengine/target/lib"
  179. addJarInDir "${ZEPPELIN_HOME}/zeppelin-server/target/lib"
  180. addJarInDir "${ZEPPELIN_HOME}/zeppelin-web/target/lib"
  181.  
  182. CLASSPATH+=":${ZEPPELIN_CLASSPATH}"
  183.  
  184. if [[ "${ZEPPELIN_NICENESS}" = "" ]]; then
  185.     export ZEPPELIN_NICENESS=0
  186. fireelemental:bin konradm$ ll
  187. total 112
  188. -rw-r--r--  1 konradm  KAINOS\Domain Users   2.8K Sep 13 15:40 common.cmd
  189. -rw-r--r--  1 konradm  KAINOS\Domain Users   4.0K Sep 13 15:40 common.sh
  190. -rw-r--r--  1 konradm  KAINOS\Domain Users   1.2K Sep 13 15:40 functions.cmd
  191. -rw-r--r--  1 konradm  KAINOS\Domain Users   4.0K Sep 13 15:40 functions.sh
  192. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   1.7K Sep 13 15:40 install-interpreter.sh
  193. -rw-r--r--  1 konradm  KAINOS\Domain Users   5.0K Sep 13 15:40 interpreter.cmd
  194. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   6.0K Sep 13 15:40 interpreter.sh
  195. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   6.6K Sep 13 15:40 zeppelin-daemon.sh
  196. -rw-r--r--  1 konradm  KAINOS\Domain Users   3.1K Sep 13 15:40 zeppelin.cmd
  197. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   2.9K Sep 13 15:40 zeppelin.sh
  198. fireelemental:bin konradm$ ./zeppelin-daemon.sh start
  199. Zeppelin start                                             [  OK  ]
  200. fireelemental:bin konradm$ ./zeppelin-daemon.sh stop
  201. Zeppelin stop                                              [  OK  ]
  202. fireelemental:bin konradm$ spark-shell
  203. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  204. Setting default log level to "WARN".
  205. To adjust logging level use sc.setLogLevel(newLevel).
  206. 16/10/18 12:19:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  207. 16/10/18 12:19:43 WARN Utils: Your hostname, fireelemental.local resolves to a loopback address: 127.0.0.1, but we couldn't find any external IP address!
  208. 16/10/18 12:19:43 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
  209. 16/10/18 12:19:44 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
  210. Spark context Web UI available at http://127.0.0.1:4040
  211. Spark context available as 'sc' (master = local[*], app id = local-1476785983992).
  212. Spark session available as 'spark'.
  213. Welcome to
  214.       ____              __
  215.      / __/__  ___ _____/ /__
  216.     _\ \/ _ \/ _ `/ __/  '_/
  217.   /___/ .__/\_,_/_/ /_/\_\  version 2.0.0
  218.      /_/
  219.        
  220. Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
  221. Type in expressions to have them evaluated.
  222. Type :help for more information.
  223.  
  224. scala> :q
  225. fireelemental:bin konradm$ sbt
  226. [info] Set current project to bin (in build file:/Users/konradm/ApacheZeppelin/bin/)
  227. > :q
  228. [error] Expected symbol
  229. [error] Not a valid command: :
  230. [error] :q
  231. [error]  ^
  232. > fireelemental:bin konradm$
  233. fireelemental:bin konradm$
  234. fireelemental:bin konradm$ ll
  235. total 112
  236. -rw-r--r--  1 konradm  KAINOS\Domain Users   2.8K Sep 13 15:40 common.cmd
  237. -rw-r--r--  1 konradm  KAINOS\Domain Users   4.0K Sep 13 15:40 common.sh
  238. -rw-r--r--  1 konradm  KAINOS\Domain Users   1.2K Sep 13 15:40 functions.cmd
  239. -rw-r--r--  1 konradm  KAINOS\Domain Users   4.0K Sep 13 15:40 functions.sh
  240. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   1.7K Sep 13 15:40 install-interpreter.sh
  241. -rw-r--r--  1 konradm  KAINOS\Domain Users   5.0K Sep 13 15:40 interpreter.cmd
  242. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   6.0K Sep 13 15:40 interpreter.sh
  243. drwxr-xr-x  3 konradm  KAINOS\Domain Users   102B Oct 18 12:19 target
  244. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   6.6K Sep 13 15:40 zeppelin-daemon.sh
  245. -rw-r--r--  1 konradm  KAINOS\Domain Users   3.1K Sep 13 15:40 zeppelin.cmd
  246. -rwxr-xr-x  1 konradm  KAINOS\Domain Users   2.9K Sep 13 15:40 zeppelin.sh
  247. fireelemental:bin konradm$ pwd
  248. /Users/konradm/ApacheZeppelin/bin
  249. fireelemental:bin konradm$ cd
  250. fireelemental:~ konradm$ ll
  251. total 3936
  252. drwxr-xr-x   3 konradm  KAINOS\Domain Users   102B Jul 16 21:23 AndroidStudioProjects
  253. drwxr-xr-x  57 konradm  KAINOS\Domain Users   1.9K Sep 28 19:25 ApacheZeppelin
  254. drwx------   6 konradm  KAINOS\Domain Users   204B Oct 16 18:16 Applications
  255. drwx------+ 21 konradm  KAINOS\Domain Users   714B Oct 17 21:16 Desktop
  256. drwxr-xr-x   3 konradm  KAINOS\Domain Users   102B Oct 11 13:33 DockerImages
  257. drwx------+ 18 konradm  KAINOS\Domain Users   612B Oct 14 22:39 Documents
  258. drwx------+ 17 konradm  KAINOS\Domain Users   578B Oct 18 09:32 Downloads
  259. drwxr-xr-x   8 konradm  KAINOS\Domain Users   272B Sep 21 12:01 IdeaProjects
  260. drwx------@ 62 konradm  KAINOS\Domain Users   2.1K Aug 12 23:03 Library
  261. drwx------+  3 konradm  KAINOS\Domain Users   102B Jun 30 14:38 Movies
  262. drwx------+  4 konradm  KAINOS\Domain Users   136B Jul 16 21:31 Music
  263. drwx------+  5 konradm  KAINOS\Domain Users   170B Jul 30 11:42 Pictures
  264. drwxr-xr-x  13 konradm  KAINOS\Domain Users   442B Oct 17 15:57 PolishMediaTextMining-ShinyVis
  265. drwxr-xr-x+  5 konradm  KAINOS\Domain Users   170B Jun 30 14:38 Public
  266. drwx------   4 konradm  KAINOS\Domain Users   136B Sep 28 18:55 VirtualBox VMs
  267. drwxr-xr-x  14 konradm  KAINOS\Domain Users   476B Sep  8 17:11 anaconda
  268. -rw-r--r--   1 konradm  KAINOS\Domain Users   678B Sep 20 15:48 derby.log
  269. drwxr-xr-x   9 konradm  KAINOS\Domain Users   306B Sep 20 15:48 metastore_db
  270. -rw-r--r--   1 konradm  KAINOS\Domain Users   8.5K Jul  4 16:41 npm-debug.log
  271. -rw-r--r--   1 konradm  KAINOS\Domain Users   920K Sep 15 10:48 rodeo.log
  272. -rw-r--r--   1 konradm  KAINOS\Domain Users   1.0M Sep 12 12:27 rodeo1.log
  273. drwxr-xr-x   6 konradm  KAINOS\Domain Users   204B Jul  2 00:39 target
  274. fireelemental:~ konradm$ ll
  275. total 3936
  276. drwxr-xr-x   3 konradm  KAINOS\Domain Users   102B Jul 16 21:23 AndroidStudioProjects
  277. drwxr-xr-x  57 konradm  KAINOS\Domain Users   1.9K Sep 28 19:25 ApacheZeppelin
  278. drwx------   6 konradm  KAINOS\Domain Users   204B Oct 16 18:16 Applications
  279. drwx------+ 21 konradm  KAINOS\Domain Users   714B Oct 17 21:16 Desktop
  280. drwxr-xr-x   3 konradm  KAINOS\Domain Users   102B Oct 11 13:33 DockerImages
  281. drwx------+ 18 konradm  KAINOS\Domain Users   612B Oct 14 22:39 Documents
  282. drwx------+ 17 konradm  KAINOS\Domain Users   578B Oct 18 09:32 Downloads
  283. drwxr-xr-x   8 konradm  KAINOS\Domain Users   272B Sep 21 12:01 IdeaProjects
  284. drwx------@ 62 konradm  KAINOS\Domain Users   2.1K Aug 12 23:03 Library
  285. drwx------+  3 konradm  KAINOS\Domain Users   102B Jun 30 14:38 Movies
  286. drwx------+  4 konradm  KAINOS\Domain Users   136B Jul 16 21:31 Music
  287. drwx------+  5 konradm  KAINOS\Domain Users   170B Jul 30 11:42 Pictures
  288. drwxr-xr-x  13 konradm  KAINOS\Domain Users   442B Oct 17 15:57 PolishMediaTextMining-ShinyVis
  289. drwxr-xr-x+  5 konradm  KAINOS\Domain Users   170B Jun 30 14:38 Public
  290. drwx------   4 konradm  KAINOS\Domain Users   136B Sep 28 18:55 VirtualBox VMs
  291. drwxr-xr-x  14 konradm  KAINOS\Domain Users   476B Sep  8 17:11 anaconda
  292. -rw-r--r--   1 konradm  KAINOS\Domain Users   678B Sep 20 15:48 derby.log
  293. drwxr-xr-x   9 konradm  KAINOS\Domain Users   306B Sep 20 15:48 metastore_db
  294. -rw-r--r--   1 konradm  KAINOS\Domain Users   8.5K Jul  4 16:41 npm-debug.log
  295. -rw-r--r--   1 konradm  KAINOS\Domain Users   920K Sep 15 10:48 rodeo.log
  296. -rw-r--r--   1 konradm  KAINOS\Domain Users   1.0M Sep 12 12:27 rodeo1.log
  297. drwxr-xr-x   6 konradm  KAINOS\Domain Users   204B Jul  2 00:39 target
  298. fireelemental:~ konradm$
  299. fireelemental:~ konradm$ cd
  300. fireelemental:~ konradm$ spark-shell
  301. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  302. Setting default log level to "WARN".
  303. To adjust logging level use sc.setLogLevel(newLevel).
  304. 16/10/18 12:59:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  305. 16/10/18 12:59:17 WARN Utils: Your hostname, fireelemental.local resolves to a loopback address: 127.0.0.1, but we couldn't find any external IP address!
  306. 16/10/18 12:59:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
  307. 16/10/18 12:59:18 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
  308. Spark context Web UI available at http://127.0.0.1:4040
  309. Spark context available as 'sc' (master = local[*], app id = local-1476788358268).
  310. Spark session available as 'spark'.
  311. Welcome to
  312.       ____              __
  313.      / __/__  ___ _____/ /__
  314.     _\ \/ _ \/ _ `/ __/  '_/
  315.   /___/ .__/\_,_/_/ /_/\_\  version 2.0.0
  316.      /_/
  317.        
  318. Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
  319. Type in expressions to have them evaluated.
  320. Type :help for more information.
  321.  
  322. scala> sc.version
  323. res0: String = 2.0.0
  324.  
  325. scala> :import
  326. 1) import spark.implicits._       (59 terms, 38 are implicit)
  327. 2) import spark.sql               (1 terms)
  328.  
  329. scala> spark
  330. res1: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@2c579202
  331.  
  332. scala> spark.
  333. baseRelationToDataFrame   conf              createDataset    emptyDataset   implicits         newSession   read         sparkContext   sqlContext   streams   udf      
  334. catalog                   createDataFrame   emptyDataFrame   experimental   listenerManager   range        readStream   sql            stop         table     version  
  335.  
  336. scala> spark.
  337. !=   ==                        conf              emptyDataset   experimental   implicits         newSession   read           sqlContext     table      wait  
  338. ##   asInstanceOf              createDataFrame   ensuring       formatted      isInstanceOf      notify       readStream     stop           toString   →      
  339. +    baseRelationToDataFrame   createDataset     eq             getClass       listenerManager   notifyAll    sparkContext   streams        udf              
  340. ->   catalog                   emptyDataFrame    equals         hashCode       ne                range        sql            synchronized   version          
  341.  
  342. scala> sc.
  343. accumulable             broadcast               files                      hadoopConfiguration   makeRDD             sequenceFile        submitJob        
  344. accumulableCollection   cancelAllJobs           getAllPools                hadoopFile            master              setCallSite         textFile        
  345. accumulator             cancelJobGroup          getCheckpointDir           hadoopRDD             newAPIHadoopFile    setCheckpointDir    uiWebUrl        
  346. addFile                 clearCallSite           getConf                    isLocal               newAPIHadoopRDD     setJobDescription   union            
  347. addJar                  clearJobGroup           getExecutorMemoryStatus    isStopped             objectFile          setJobGroup         version          
  348. addSparkListener        collectionAccumulator   getExecutorStorageStatus   jars                  parallelize         setLocalProperty    wholeTextFiles  
  349. appName                 defaultMinPartitions    getLocalProperty           killExecutor          range               setLogLevel                          
  350. applicationAttemptId    defaultParallelism      getPersistentRDDs          killExecutors         register            sparkUser                            
  351. applicationId           deployMode              getPoolForName             listFiles             requestExecutors    startTime                            
  352. binaryFiles             doubleAccumulator       getRDDStorageInfo          listJars              runApproximateJob   statusTracker                        
  353. binaryRecords           emptyRDD                getSchedulingMode          longAccumulator       runJob              stop                                
  354.  
  355. scala> spark.udf
  356. res2: org.apache.spark.sql.UDFRegistration = org.apache.spark.sql.UDFRegistration@74832504
  357.  
  358. scala> sp
  359. spark   spark_partition_id   specialized   spire   split
  360.  
  361. scala> spark.sqlContext.
  362. applySchema               createDataFrame       emptyDataFrame   implicits   jsonRDD           parquetFile   setConf        streams      udf            
  363. baseRelationToDataFrame   createDataset         experimental     isCached    listenerManager   range         sparkContext   table        uncacheTable  
  364. cacheTable                createExternalTable   getAllConfs      jdbc        load              read          sparkSession   tableNames                  
  365. clearCache                dropTempTable         getConf          jsonFile    newSession        readStream    sql            tables                      
  366.  
  367. scala> spark.read.
  368. csv   format   jdbc   json   load   option   options   orc   parquet   schema   table   text   textFile
  369.  
  370. scala> spark.read.
  371. !=   +    ==             csv        eq       format      getClass   isInstanceOf   json   ne       notifyAll   options   parquet   synchronized   text       toString   →  
  372. ##   ->   asInstanceOf   ensuring   equals   formatted   hashCode   jdbc           load   notify   option      orc       schema    table          textFile   wait          
  373.  
  374. scala> spark.read.csv.
  375. csv   format   jdbc   json   load   option   options   orc   parquet   schema   table   text   textFile
  376.  
  377. scala> spark.read.csv.
  378. !=   +    ==             csv        eq       format      getClass   isInstanceOf   json   ne       notifyAll   options   parquet   synchronized   text       toString   →  
  379. ##   ->   asInstanceOf   ensuring   equals   formatted   hashCode   jdbc           load   notify   option      orc       schema    table          textFile   wait          
  380.  
  381. scala> spark.read.csv
  382.   def csv(paths: String*): org.apache.spark.sql.DataFrame   def csv(path: String): org.apache.spark.sql.DataFrame
  383.  
  384. scala> spark.read.csv.
  385. csv   format   jdbc   json   load   option   options   orc   parquet   schema   table   text   textFile
  386.  
  387. scala> spark.read.csv.
  388. !=   +    ==             csv        eq       format      getClass   isInstanceOf   json   ne       notifyAll   options   parquet   synchronized   text       toString   →  
  389. ##   ->   asInstanceOf   ensuring   equals   formatted   hashCode   jdbc           load   notify   option      orc       schema    table          textFile   wait          
  390.  
  391. scala> spark.read.csv
  392.   def csv(paths: String*): org.apache.spark.sql.DataFrame   def csv(path: String): org.apache.spark.sql.DataFrame
  393.  
  394. scala> spark.read.csv("/Users/konradm/Desktop/people.csv")
  395. res3: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]
  396.  
  397. scala> val frm = spark.read.csv("/Users/konradm/Desktop/people.csv")
  398. frm: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]
  399.  
  400. scala> frm.show
  401. +---+------+----+
  402. |_c0|   _c1| _c2|
  403. +---+------+----+
  404. | id|  name| age|
  405. |  0| Jacek|  42|
  406. |  1|Unmesh|  10|
  407. |  2|Ulrich|  12|
  408. |  3|Ulrich|1220|
  409. |  4|Ullash|1220|
  410. +---+------+----+
  411.  
  412.  
  413. scala> val frm = spark.read.option("header", true).csv("/Users/konradm/Desktop/people.csv")
  414. frm: org.apache.spark.sql.DataFrame = [id: string, name: string ... 1 more field]
  415.  
  416. scala> frm.show
  417. +---+------+----+
  418. | id|  name| age|
  419. +---+------+----+
  420. |  0| Jacek|  42|
  421. |  1|Unmesh|  10|
  422. |  2|Ulrich|  12|
  423. |  3|Ulrich|1220|
  424. |  4|Ullash|1220|
  425. +---+------+----+
  426.  
  427.  
  428. scala> select * from frm
  429. <console>:24: error: not found: value select
  430.       select * from frm
  431.       ^
  432. <console>:24: error: not found: value from
  433.       select * from frm
  434.                ^
  435.  
  436. scala> frm.create
  437. createOrReplaceTempView   createTempView
  438.  
  439. scala> frm.createOrReplaceTempView("mojCSV")
  440.  
  441. scala> sql("select * from mojCSV").show
  442. +---+------+----+
  443. | id|  name| age|
  444. +---+------+----+
  445. |  0| Jacek|  42|
  446. |  1|Unmesh|  10|
  447. |  2|Ulrich|  12|
  448. |  3|Ulrich|1220|
  449. |  4|Ullash|1220|
  450. +---+------+----+
  451.  
  452.  
  453. scala> case class Person(id: Long, name: String, age: Int)
  454. defined class Person
  455.  
  456. scala> frm.filter(p: Person => p.id == 2)
  457. <console>:1: error: identifier expected but integer literal found.
  458. frm.filter(p: Person => p.id == 2)
  459.                                ^
  460.  
  461. scala> frm.filter{p: Person => p.id == 2}
  462. <console>:28: error: type mismatch;
  463. found   : Person => Boolean
  464. required: org.apache.spark.sql.Row => Boolean
  465.       frm.filter{p: Person => p.id == 2}
  466.                            ^
  467.  
  468. scala> sql("select * from mojCSV").as[Person]
  469. org.apache.spark.sql.AnalysisException: Cannot up cast mojcsv.`id` from string to bigint as it may truncate
  470. The type path of the target object is:
  471. - field (class: "scala.Long", name: "id")
  472. - root class: "Person"
  473. You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;
  474.  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:1990)
  475.  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$34$$anonfun$applyOrElse$14.applyOrElse(Analyzer.scala:2020)
  476.  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$34$$anonfun$applyOrElse$14.applyOrElse(Analyzer.scala:2007)
  477.  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
  478.  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
  479.  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
  480.  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278)
  481.  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
  482.  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
  483.  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
  484.  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
  485.  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
  486.  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
  487.  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
  488.  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
  489.  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:350)
  490.  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  491.  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  492.  at scala.collection.immutable.List.foreach(List.scala:381)
  493.  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  494.  at scala.collection.immutable.List.map(List.scala:285)
  495.  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:348)
  496.  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
  497.  at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
  498.  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
  499.  at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156)
  500.  at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166)
  501.  at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175)
  502.  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
  503.  at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175)
  504.  at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:144)
  505.  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$34.applyOrElse(Analyzer.scala:2007)
  506.  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$34.applyOrElse(Analyzer.scala:2003)
  507.  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
  508.  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
  509.  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
  510.  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:60)
  511.  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.apply(Analyzer.scala:2003)
  512.  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.apply(Analyzer.scala:1988)
  513.  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
  514.  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
  515.  at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
  516.  at scala.collection.immutable.List.foldLeft(List.scala:84)
  517.  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
  518.  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
  519.  at scala.collection.immutable.List.foreach(List.scala:381)
  520.  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
  521.  at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolveAndBind(ExpressionEncoder.scala:244)
  522.  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:210)
  523.  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
  524.  at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59)
  525.  at org.apache.spark.sql.Dataset.as(Dataset.scala:359)
  526.  ... 48 elided
  527.  
  528. scala> val frm = spark.read.option("header", true).option("inferSchema", true).csv("/Users/konradm/Desktop/people.csv")
  529. frm: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
  530.  
  531. scala> frm.printSchema
  532. root
  533. |-- id: integer (nullable = true)
  534. |-- name: string (nullable = true)
  535. |-- age: integer (nullable = true)
  536.  
  537.  
  538. scala> frm.as[Person]
  539. res12: org.apache.spark.sql.Dataset[Person] = [id: int, name: string ... 1 more field]
  540.  
  541. scala> frm.filter(p => p.int == 2)
  542. <console>:26: error: value int is not a member of org.apache.spark.sql.Row
  543.       frm.filter(p => p.int == 2)
  544.                         ^
  545.  
  546. scala> frm.filter(p => p.id == 2)
  547. <console>:26: error: value id is not a member of org.apache.spark.sql.Row
  548.       frm.filter(p => p.id == 2)
  549.                         ^
  550.  
  551. scala> frm.where
  552.                                                                                                            
  553. def where(conditionExpr: String): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]                    
  554. def where(condition: org.apache.spark.sql.Column): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]  
  555.  
  556. scala> frm.filter
  557.                                                                                                                                                        
  558. def filter(func: org.apache.spark.api.java.function.FilterFunction[org.apache.spark.sql.Row]): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]  
  559. def filter(func: org.apache.spark.sql.Row => Boolean): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]                                          
  560. def filter(conditionExpr: String): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]                                                              
  561. def filter(condition: org.apache.spark.sql.Column): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]                                              
  562.  
  563. scala> frm.filter(p: Person => p.id == 2)
  564. <console>:1: error: identifier expected but integer literal found.
  565. frm.filter(p: Person => p.id == 2)
  566.                                ^
  567.  
  568. scala> frm.filter(p: Person => p.id === 2)
  569. <console>:1: error: identifier expected but integer literal found.
  570. frm.filter(p: Person => p.id === 2)
  571.                                 ^
  572.  
  573. scala> frm.filter(p: Person => p.id = 2)
  574. <console>:1: error: ')' expected but '=' found.
  575. frm.filter(p: Person => p.id = 2)
  576.                             ^
  577.  
  578. scala> frm.filter(p: Person => p.id == 2)
  579. <console>:1: error: identifier expected but integer literal found.
  580. frm.filter(p: Person => p.id == 2)
  581.                                ^
  582.  
  583. scala> frm
  584. res15: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
  585.  
  586. scala> frm.printSchema
  587. root
  588. |-- id: integer (nullable = true)
  589. |-- name: string (nullable = true)
  590. |-- age: integer (nullable = true)
  591.  
  592.  
  593. scala> frm.filter
  594.                                                                                                                                                        
  595. def filter(func: org.apache.spark.api.java.function.FilterFunction[org.apache.spark.sql.Row]): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]  
  596. def filter(func: org.apache.spark.sql.Row => Boolean): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]                                          
  597. def filter(conditionExpr: String): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]                                                              
  598. def filter(condition: org.apache.spark.sql.Column): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]                                              
  599.  
  600. scala> frm.filter($"age" > 40)
  601. res17: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: int, name: string ... 1 more field]
  602.  
  603. scala> frm.filter($"age" > 40).show
  604. +---+------+----+
  605. | id|  name| age|
  606. +---+------+----+
  607. |  0| Jacek|  42|
  608. |  3|Ulrich|1220|
  609. |  4|Ullash|1220|
  610. +---+------+----+
  611.  
  612.  
  613. scala> frm.filter($"age".> 40).show
  614. <console>:1: error: ')' expected but integer literal found.
  615. frm.filter($"age".> 40).show
  616.                    ^
  617.  
  618. scala> frm.filter($"age" > 40).show
  619. +---+------+----+
  620. | id|  name| age|
  621. +---+------+----+
  622. |  0| Jacek|  42|
  623. |  3|Ulrich|1220|
  624. |  4|Ullash|1220|
  625. +---+------+----+
  626.  
  627.  
  628. scala> frm.filter('age > 40).show
  629. +---+------+----+
  630. | id|  name| age|
  631. +---+------+----+
  632. |  0| Jacek|  42|
  633. |  3|Ulrich|1220|
  634. |  4|Ullash|1220|
  635. +---+------+----+
  636.  
  637.  
  638. scala> frm.filter(_.age > 40).show
  639. <console>:26: error: value age is not a member of org.apache.spark.sql.Row
  640.        frm.filter(_.age > 40).show
  641.                     ^
  642.  
  643. scala> frm.filter(p => p.age > 40).show
  644. <console>:26: error: value age is not a member of org.apache.spark.sql.Row
  645.        frm.filter(p => p.age > 40).show
  646.                          ^
  647.  
  648. scala> frm
  649. res23: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
  650.  
  651. scala> frm.as[Person].filter(_.age > 40)
  652. res24: org.apache.spark.sql.Dataset[Person] = [id: int, name: string ... 1 more field]
  653.  
  654. scala> frm.as[Person].filter(p => p.age > 40)
  655. res25: org.apache.spark.sql.Dataset[Person] = [id: int, name: string ... 1 more field]
  656.  
  657. scala> frm.as[Person].groupBy(p => p.age > 40).show
  658. <console>:28: error: missing parameter type
  659.        frm.as[Person].groupBy(p => p.age > 40).show
  660.                               ^
  661.  
  662. scala> frm.as[Person].groupByKey
  663.  
  664. def groupByKey[K](func: org.apache.spark.api.java.function.MapFunction[Person,K],encoder: org.apache.spark.sql.Encoder[K]): org.apache.spark.sql.KeyValueGroupedDataset[K,Person]
  665. def groupByKey[K](func: Person => K)(implicit evidence$4: org.apache.spark.sql.Encoder[K]): org.apache.spark.sql.KeyValueGroupedDataset[K,Person]
  666.  
  667. scala> frm.as[Person].groupByKey(_.age > 40).show
  668. <console>:28: error: value show is not a member of org.apache.spark.sql.KeyValueGroupedDataset[Boolean,Person]
  669.        frm.as[Person].groupByKey(_.age > 40).show
  670.                                              ^
  671.  
  672. scala> frm.as[Person].groupByKey(_.age > 40)
  673. res28: org.apache.spark.sql.KeyValueGroupedDataset[Boolean,Person] = org.apache.spark.sql.KeyValueGroupedDataset@3a66ac76
  674.  
  675. scala> frm.as[Person].groupByKey(_.age > 40).count
  676. res29: org.apache.spark.sql.Dataset[(Boolean, Long)] = [value: boolean, count(1): bigint]
  677.  
  678. scala> frm.as[Person].groupByKey(_.age > 40).count.show
  679. +-----+--------+
  680. |value|count(1)|
  681. +-----+--------+
  682. | true|       3|
  683. |false|       2|
  684. +-----+--------+
  685.  
  686.  
  687. scala> frm.as[Person].reduceByKey(_.age > 40).count.show
  688. <console>:28: error: value reduceByKey is not a member of org.apache.spark.sql.Dataset[Person]
  689.        frm.as[Person].reduceByKey(_.age > 40).count.show
  690.                       ^
  691.  
  692. scala> frm.as[Person].groupByKey(_.age > 40).count.show
  693. +-----+--------+
  694. |value|count(1)|
  695. +-----+--------+
  696. | true|       3|
  697. |false|       2|
  698. +-----+--------+
  699.  
  700.  
  701. scala> frm.as[Person].groupByKey(identity).count.show
  702. +---------------+--------+
  703. |            key|count(1)|
  704. +---------------+--------+
  705. |  [1,Unmesh,10]|       1|
  706. |[3,Ulrich,1220]|       1|
  707. |   [0,Jacek,42]|       1|
  708. |  [2,Ulrich,12]|       1|
  709. |[4,Ullash,1220]|       1|
  710. +---------------+--------+
  711.  
  712.  
  713. scala> frm.groupBy().count
  714. res34: org.apache.spark.sql.DataFrame = [count: bigint]
  715.  
  716. scala> frm.groupBy().count.show
  717. +-----+
  718. |count|
  719. +-----+
  720. |    5|
  721. +-----+
  722.  
  723.  
  724. scala> frm.groupBy().count.expl
  725. explain   explode
  726.  
  727. scala> frm.groupBy().count.explain
  728. == Physical Plan ==
  729. *HashAggregate(keys=[], functions=[count(1)])
  730. +- Exchange SinglePartition
  731.    +- *HashAggregate(keys=[], functions=[partial_count(1)])
  732.       +- *Scan csv [] Format: CSV, InputPaths: file:/Users/konradm/Desktop/people.csv, PushedFilters: [], ReadSchema: struct<>
  733.  
  734. scala> frm.groupBy().count.explain(extended = true)
  735. == Parsed Logical Plan ==
  736. Aggregate [count(1) AS count#230L]
  737. +- Relation[id#58,name#59,age#60] csv
  738.  
  739. == Analyzed Logical Plan ==
  740. count: bigint
  741. Aggregate [count(1) AS count#230L]
  742. +- Relation[id#58,name#59,age#60] csv
  743.  
  744. == Optimized Logical Plan ==
  745. Aggregate [count(1) AS count#230L]
  746. +- Project
  747.    +- Relation[id#58,name#59,age#60] csv
  748.  
  749. == Physical Plan ==
  750. *HashAggregate(keys=[], functions=[count(1)], output=[count#230L])
  751. +- Exchange SinglePartition
  752.    +- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#235L])
  753.       +- *Scan csv [] Format: CSV, InputPaths: file:/Users/konradm/Desktop/people.csv, PushedFilters: [], ReadSchema: struct<>
  754.  
  755. scala> // frm.as[Person].groupByKey(_.age > 40).count.show
  756.  
  757. scala> val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
  758. <console>:23: error: not found: value Window
  759.        val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
  760.                         ^
  761.  
  762. scala> import org.apache.spark.sql.
  763. AnalysisException      DataFrameReader          Encoder               KeyValueGroupedDataset     SQLContext     TypedColumn       execution     jdbc        types  
  764. Column                 DataFrameStatFunctions   Encoders              RelationalGroupedDataset   SQLImplicits   UDFRegistration   expressions   package     util    
  765. ColumnName             DataFrameWriter          ExperimentalMethods   Row                        SaveMode       api               functions     sources            
  766. DataFrame              Dataset                  ForeachWriter         RowFactory                 SparkSession   catalog           hive          streaming          
  767. DataFrameNaFunctions   DatasetHolder            InternalOutputModes   RuntimeConfig              Strategy       catalyst          internal      test                
  768.  
  769. scala> import org.apache.
  770. avro      commons   derby       hadoop   html     http   jute    mesos   parquet   thrift   xbean    xml              
  771. calcite   curator   directory   hive     htrace   ivy    log4j   oro     spark     wml      xerces   zookeeper        
  772.  
  773. scala> import org.apache.spark.
  774. Accumulable                GrowableAccumulableParam         RangeDependency                 SparkException          TaskEndReason                   deploy     repl        
  775. AccumulableParam           HashPartitioner                  RangePartitioner                SparkExecutorInfo       TaskFailedReason                executor   rpc          
  776. Accumulator                InternalAccumulator              Resubmitted                     SparkExecutorInfoImpl   TaskKilled                      graphx     scheduler    
  777. AccumulatorParam           InterruptibleIterator            SPARK_BRANCH                    SparkFiles              TaskKilledException             input      security    
  778. Aggregator                 JavaFutureActionWrapper          SPARK_BUILD_DATE                SparkFirehoseListener   TaskNotSerializableException    internal   serializer  
  779. CleanerListener            JobExecutionStatus               SPARK_BUILD_USER                SparkJobInfo            TaskResultLost                  io         shuffle      
  780. CleanupTask                JobSubmitter                     SPARK_REPO_URL                  SparkJobInfoImpl        TaskSchedulerIsSet              launcher   sql          
  781. CleanupTaskWeakReference   MapOutputStatistics              SPARK_REVISION                  SparkMasterRegex        TaskState                       mapred     status      
  782. ComplexFutureAction        MapOutputTrackerMaster           SPARK_VERSION                   SparkStageInfo          TestUtils                       memory     storage      
  783. Dependency                 MapOutputTrackerMasterEndpoint   SerializableWritable            SparkStageInfoImpl      ThrowableSerializationWrapper   metrics    streaming    
  784. ExceptionFailure           MapOutputTrackerMessage          ShuffleDependency               SparkStatusTracker      UnknownReason                   ml         tags        
  785. ExecutorAllocationClient   MapOutputTrackerWorker           SimpleFutureAction              SpillListener           WritableConverter               mllib      ui          
  786. ExecutorLostFailure        NarrowDependency                 SparkConf                       StopMapOutputTracker    WritableFactory                 network    unsafe      
  787. ExpireDeadHosts            OneToOneDependency               SparkContext                    Success                 annotation                      package    unused      
  788. FetchFailed                Partition                        SparkDriverExecutionException   TaskCommitDenied        api                             partial    util        
  789. FutureAction               Partitioner                      SparkEnv                        TaskContext             broadcast                       rdd                    
  790.  
  791. scala> import org.apache.spark.sql.expressions.Window
  792. Window   WindowSpec
  793.  
  794. scala> import org.apache.spark.sql.expressions
  795. import org.apache.spark.sql.expressions
  796.  
  797. scala> val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
  798. <console>:24: error: not found: value Window
  799.        val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
  800.                         ^
  801.  
  802. scala> import org.apache.spark.sql.expressions._
  803. import org.apache.spark.sql.expressions._
  804.  
  805. scala> val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
  806. windowSpec: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@8ad08cc
  807.  
  808. scala> people.withColumn("rank", rank over windowSpec).filter($"rank" === 1).select('name, 'age).show
  809. <console>:30: error: not found: value people
  810.        people.withColumn("rank", rank over windowSpec).filter($"rank" === 1).select('name, 'age).show
  811.        ^
  812.  
  813. scala> frm.withColumn("rank", rank over windowSpec).filter($"rank" === 1).select('name, 'age).show
  814. +------+---+
  815. |  name|age|
  816. +------+---+
  817. | Jacek| 42|
  818. |Unmesh| 10|
  819. +------+---+
  820.  
  821.  
  822. scala> import org.apache.spark.ml._
  823. import org.apache.spark.ml._
  824.  
  825. scala> org.apache.spark.ml.
  826. Estimator   PipelineModel     Predictor         UnaryTransformer   classification   feature   optim     python           regression   tree    
  827. Model       PipelineStage     PredictorParams   ann                clustering       impl      package   r                source       tuning  
  828. Pipeline    PredictionModel   Transformer       attribute          evaluation       linalg    param     recommendation   stat         util    
  829.  
  830. scala> org.apache.spark.ml.tree.
  831. CategoricalSplit               DecisionTreeModelReadWrite    GBTParams                  LeafNode                                RandomForestRegressionModelParams   impl  
  832. ContinuousSplit                DecisionTreeParams            HasFeatureSubsetStrategy   Node                                    RandomForestRegressorParams                
  833. DecisionTreeClassifierParams   DecisionTreeRegressorParams   HasNumTrees                RandomForestClassificationModelParams   Split                                      
  834. DecisionTreeModel              EnsembleModelReadWrite        InternalNode               RandomForestClassifierParams            TreeEnsembleParams                        
  835.  
  836. scala> import org.apache.spark.ml.Transformer
  837. import org.apache.spark.ml.Transformer
  838.  
  839. scala> //2 transformers
  840.  
  841. scala> //tokenizer
  842.  
  843. scala> //hashing tf
  844.  
  845. scala> import org.apache.spark.ml.feature._
  846. import org.apache.spark.ml.feature._
  847.  
  848. scala> val tok = new Tokenizer()
  849. tok: org.apache.spark.ml.feature.Tokenizer = tok_680b50261bfe
  850.  
  851. scala> val emails = Seq(
  852.      | Display all 759 possibilities? (y or n)
  853.      | (0, "This is NOT a spam", 1)
  854.      | (1, "Hey Jacek, Wanna salary rise?", 0),
  855.      | (2, "No i jak Pawel, dobrze sie bawisz?", 0),
  856.      |
  857.      |
  858. You typed two blank lines.  Starting a new command.
  859.  
  860. scala> val emails = Seq(
  861.      |  (0, "this is NOT a spam", 1),
  862.      |  (1, "Hey Jacek, Wanna salaary rise?", 0),
  863.      |  (2, "No i jak Pawel, dobrze sie bawisz?", 0),
  864.      |  (3, "SPAM VIAGRA ENLARGE P..S", 1)).toDF("id", "email", "label")
  865. emails: org.apache.spark.sql.DataFrame = [id: int, email: string ... 1 more field]
  866.  
  867. scala> emails.show
  868. +---+--------------------+-----+
  869. | id|               email|label|
  870. +---+--------------------+-----+
  871. |  0|  this is NOT a spam|    1|
  872. |  1|Hey Jacek, Wanna ...|    0|
  873. |  2|No i jak Pawel, d...|    0|
  874. |  3|SPAM VIAGRA ENLAR...|    1|
  875. +---+--------------------+-----+
  876.  
  877.  
  878. scala> emails.show(false)
  879. +---+----------------------------------+-----+
  880. |id |email                             |label|
  881. +---+----------------------------------+-----+
  882. |0  |this is NOT a spam                |1    |
  883. |1  |Hey Jacek, Wanna salaary rise?    |0    |
  884. |2  |No i jak Pawel, dobrze sie bawisz?|0    |
  885. |3  |SPAM VIAGRA ENLARGE P..S          |1    |
  886. +---+----------------------------------+-----+
  887.  
  888.  
  889. scala> tok.transform
  890. transform   transformSchema
  891.  
  892. scala> tok.transform
  893.  
  894. override def transform(dataset: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame
  895. def transform(dataset: org.apache.spark.sql.Dataset[_],paramMap: org.apache.spark.ml.param.ParamMap): org.apache.spark.sql.DataFrame
  896. def transform(dataset: org.apache.spark.sql.Dataset[_],firstParamPair: org.apache.spark.ml.param.ParamPair[_],otherParamPairs: org.apache.spark.ml.param.ParamPair[_]*): org.apache.spark.sql.DataFrame
  897.  
  898. scala> tok.transform
  899.  
  900. override def transform(dataset: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame
  901. def transform(dataset: org.apache.spark.sql.Dataset[_],paramMap: org.apache.spark.ml.param.ParamMap): org.apache.spark.sql.DataFrame
  902. def transform(dataset: org.apache.spark.sql.Dataset[_],firstParamPair: org.apache.spark.ml.param.ParamPair[_],otherParamPairs: org.apache.spark.ml.param.ParamPair[_]*): org.apache.spark.sql.DataFrame
  903.  
  904. scala> tok.transform(emails)
  905. java.util.NoSuchElementException: Failed to find a default value for inputCol
  906.   at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:660)
  907.   at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:660)
  908.   at scala.Option.getOrElse(Option.scala:121)
  909.   at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:659)
  910.   at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
  911.   at org.apache.spark.ml.param.Params$class.$(params.scala:664)
  912.   at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
  913.   at org.apache.spark.ml.UnaryTransformer.transformSchema(Transformer.scala:109)
  914.   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:70)
  915.   at org.apache.spark.ml.UnaryTransformer.transform(Transformer.scala:120)
  916.   ... 54 elided
  917.  
  918. scala> tok.setInputCol("email").transform(emails)
  919. res43: org.apache.spark.sql.DataFrame = [id: int, email: string ... 2 more fields]
  920.  
  921. scala> tok.setInputCol("email").transform(emails).show(false)
  922. +---+----------------------------------+-----+------------------------------------------+
  923. |id |email                             |label|tok_680b50261bfe__output                  |
  924. +---+----------------------------------+-----+------------------------------------------+
  925. |0  |this is NOT a spam                |1    |[this, is, not, a, spam]                  |
  926. |1  |Hey Jacek, Wanna salaary rise?    |0    |[hey, jacek,, wanna, salaary, rise?]      |
  927. |2  |No i jak Pawel, dobrze sie bawisz?|0    |[no, i, jak, pawel,, dobrze, sie, bawisz?]|
  928. |3  |SPAM VIAGRA ENLARGE P..S          |1    |[spam, viagra, enlarge, p..s]             |
  929. +---+----------------------------------+-----+------------------------------------------+
  930.  
  931.  
  932. scala> val tokens = tok.setInputCol("email").transform(emails).show(false)
  933. +---+----------------------------------+-----+------------------------------------------+
  934. |id |email                             |label|tok_680b50261bfe__output                  |
  935. +---+----------------------------------+-----+------------------------------------------+
  936. |0  |this is NOT a spam                |1    |[this, is, not, a, spam]                  |
  937. |1  |Hey Jacek, Wanna salaary rise?    |0    |[hey, jacek,, wanna, salaary, rise?]      |
  938. |2  |No i jak Pawel, dobrze sie bawisz?|0    |[no, i, jak, pawel,, dobrze, sie, bawisz?]|
  939. |3  |SPAM VIAGRA ENLARGE P..S          |1    |[spam, viagra, enlarge, p..s]             |
  940. +---+----------------------------------+-----+------------------------------------------+
  941.  
  942. tokens: Unit = ()
  943.  
  944. scala> tokens.printSchema
  945. <console>:41: error: value printSchema is not a member of Unit
  946.        tokens.printSchema
  947.               ^
  948.  
  949. scala> tokens
  950.  
  951. scala> val tokens = tok.setInputCol("email").transform(emails)
  952. tokens: org.apache.spark.sql.DataFrame = [id: int, email: string ... 2 more fields]
  953.  
  954. scala> tokens
  955. res47: org.apache.spark.sql.DataFrame = [id: int, email: string ... 2 more fields]
  956.  
  957. scala> tokens.printSchema
  958. root
  959.  |-- id: integer (nullable = false)
  960.  |-- email: string (nullable = true)
  961.  |-- label: integer (nullable = false)
  962.  |-- tok_680b50261bfe__output: array (nullable = true)
  963.  |    |-- element: string (containsNull = true)
  964.  
  965.  
  966. scala> val hashTF = new HashingTF().setInputCol(tok.getOutputCol).setOutputCol("features")
  967. hashTF: org.apache.spark.ml.feature.HashingTF = hashingTF_bb9390313079
  968.  
  969. scala> hashTF
  970. res49: org.apache.spark.ml.feature.HashingTF = hashingTF_bb9390313079
  971.  
  972. scala> hashTF.transform
  973. transform   transformSchema
  974.  
  975. scala> hashTF.transform
  976.  
  977. override def transform(dataset: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame
  978. def transform(dataset: org.apache.spark.sql.Dataset[_],paramMap: org.apache.spark.ml.param.ParamMap): org.apache.spark.sql.DataFrame
  979. def transform(dataset: org.apache.spark.sql.Dataset[_],firstParamPair: org.apache.spark.ml.param.ParamPair[_],otherParamPairs: org.apache.spark.ml.param.ParamPair[_]*): org.apache.spark.sql.DataFrame
  980.  
  981. scala> hashTF.transform(tokens)
  982. res50: org.apache.spark.sql.DataFrame = [id: int, email: string ... 3 more fields]
  983.  
  984. scala> hashTF.transform(tokens).show
  985. +---+--------------------+-----+------------------------+--------------------+
  986. | id|               email|label|tok_680b50261bfe__output|            features|
  987. +---+--------------------+-----+------------------------+--------------------+
  988. |  0|  this is NOT a spam|    1|    [this, is, not, a...|(262144,[15889,10...|
  989. |  1|Hey Jacek, Wanna ...|    0|    [hey, jacek,, wan...|(262144,[25736,66...|
  990. |  2|No i jak Pawel, d...|    0|    [no, i, jak, pawe...|(262144,[24417,53...|
  991. |  3|SPAM VIAGRA ENLAR...|    1|    [spam, viagra, en...|(262144,[4428,137...|
  992. +---+--------------------+-----+------------------------+--------------------+
  993.  
  994.  
  995. scala> val hashed = hashTF.transform(tokens)
  996. hashed: org.apache.spark.sql.DataFrame = [id: int, email: string ... 3 more fields]
  997.  
  998. scala> hashed.printSchema
  999. root
  1000.  |-- id: integer (nullable = false)
  1001.  |-- email: string (nullable = true)
  1002.  |-- label: integer (nullable = false)
  1003.  |-- tok_680b50261bfe__output: array (nullable = true)
  1004.  |    |-- element: string (containsNull = true)
  1005.  |-- features: vector (nullable = true)
  1006.  
  1007.  
  1008. scala> import org.apache.spark.ml.classification._
  1009. import org.apache.spark.ml.classification._
  1010.  
  1011. scala> val logReg = new Logistic
  1012. LogisticAggregator   LogisticRegression        LogisticRegressionParams    LogisticRegressionTrainingSummary  
  1013. LogisticCostFun      LogisticRegressionModel   LogisticRegressionSummary                                      
  1014.  
  1015. scala> val logReg = new LogisticRegression
  1016. LogisticRegression   LogisticRegressionModel   LogisticRegressionParams   LogisticRegressionSummary   LogisticRegressionTrainingSummary
  1017.  
  1018. scala> val logReg = new LogisticRegression()
  1019. logReg: org.apache.spark.ml.classification.LogisticRegression = logreg_e6ccb54d045e
  1020.  
  1021. scala> val model = logReg.fit(hashed)
  1022. 16/10/18 16:27:33 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
  1023. 16/10/18 16:27:33 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
  1024. model: org.apache.spark.ml.classification.LogisticRegressionModel = logreg_e6ccb54d045e
  1025.  
  1026. scala> model
  1027. res53: org.apache.spark.ml.classification.LogisticRegressionModel = logreg_e6ccb54d045e
  1028.  
  1029. scala> model.
  1030. clear             featuresCol          getMaxIter            getThreshold    intercept     parent             setParent             threshold         weightCol  
  1031. coefficients      fitIntercept         getOrDefault          getThresholds   isDefined     predictionCol      setPredictionCol      thresholds        write      
  1032. copy              get                  getParam              getTol          isSet         probabilityCol     setProbabilityCol     toString                      
  1033. elasticNetParam   getDefault           getPredictionCol      getWeightCol    labelCol      rawPredictionCol   setRawPredictionCol   tol                          
  1034. evaluate          getElasticNetParam   getProbabilityCol     hasDefault      maxIter       regParam           setThreshold          transform                    
  1035. explainParam      getFeaturesCol       getRawPredictionCol   hasParam        numClasses    save               setThresholds         transformSchema              
  1036. explainParams     getFitIntercept      getRegParam           hasParent       numFeatures   set                standardization       uid                          
  1037. extractParamMap   getLabelCol          getStandardization    hasSummary      params        setFeaturesCol     summary               validateParams                
  1038.  
  1039. scala> model.transform
  1040. transform   transformSchema
  1041.  
  1042. scala> model.transform
  1043.  
  1044. override def transform(dataset: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame
  1045. def transform(dataset: org.apache.spark.sql.Dataset[_],paramMap: org.apache.spark.ml.param.ParamMap): org.apache.spark.sql.DataFrame
  1046. def transform(dataset: org.apache.spark.sql.Dataset[_],firstParamPair: org.apache.spark.ml.param.ParamPair[_],otherParamPairs: org.apache.spark.ml.param.ParamPair[_]*): org.apache.spark.sql.DataFrame
  1047.  
  1048. scala> model.transform(hashed).select('label, 'prediction).show
  1049. +-----+----------+
  1050. |label|prediction|
  1051. +-----+----------+
  1052. |    1|       1.0|
  1053. |    0|       0.0|
  1054. |    0|       0.0|
  1055. |    1|       1.0|
  1056. +-----+----------+
  1057.  
  1058.  
  1059. scala> //microsoft hdinsight
  1060.  
  1061. scala> val pipeline = new Pipeline().setStages(Array(tok, hashTF, logReg))
  1062. pipeline: org.apache.spark.ml.Pipeline = pipeline_e47df7c4f800
  1063.  
  1064. scala> val model = pipeline.fit(emails)
  1065. model: org.apache.spark.ml.PipelineModel = pipeline_e47df7c4f800
  1066.  
  1067. scala> val testing = Seq(
  1068.      |  "To nie jest SPAM",
  1069.      |  "To jest").toDF("email")
  1070. testing: org.apache.spark.sql.DataFrame = [email: string]
  1071.  
  1072. scala> model.transform(testing).show(false)
  1073. +----------------+------------------------+--------------------------------------------------------+------------------------------------------+-----------------------------------------+----------+
  1074. |email           |tok_680b50261bfe__output|features                                                |rawPrediction                             |probability                              |prediction|
  1075. +----------------+------------------------+--------------------------------------------------------+------------------------------------------+-----------------------------------------+----------+
  1076. |To nie jest SPAM|[to, nie, jest, spam]   |(262144,[152658,197793,205044,215930],[1.0,1.0,1.0,1.0])|[-6.102587858518542,6.102587858518542]    |[0.002232077682682062,0.9977679223173178]|1.0       |
  1077. |To jest         |[to, jest]              |(262144,[205044,215930],[1.0,1.0])                      |[-0.24283585831430904,0.24283585831430904]|[0.4395876168221684,0.5604123831778316]  |1.0       |
  1078. +----------------+------------------------+--------------------------------------------------------+------------------------------------------+-----------------------------------------+----------+
  1079.  
  1080.  
  1081. scala> model.transform(testing).select('email, 'prediction).show(false)
  1082. +----------------+----------+
  1083. |email           |prediction|
  1084. +----------------+----------+
  1085. |To nie jest SPAM|1.0       |
  1086. |To jest         |1.0       |
  1087. +----------------+----------+
  1088.  
  1089.  
  1090. scala> model.save.
  1091. clear   explainParam    extractParamMap   getDefault     getParam     hasParam    isDefined   params   save   setParent   toString    transformSchema   validateParams  
  1092. copy    explainParams   get               getOrDefault   hasDefault   hasParent   isSet       parent   set    stages      transform   uid               write            
  1093.  
  1094. scala> model.write.
  1095. context   overwrite   save   session
  1096.  
  1097. scala> model.write.save("moj-model-sopocki")
  1098. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
  1099. SLF4J: Defaulting to no-operation (NOP) logger implementation
  1100. SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
  1101.  
  1102. scala> //jpmml
  1103.  
  1104. scala>
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement