Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Last login: Tue Oct 18 09:22:09 on ttys001
- fireelemental:ApacheZeppelin konradm$ ll
- total 144
- -rw-r--r-- 1 konradm KAINOS\Domain Users 15K Sep 13 15:40 LICENSE
- -rw-r--r-- 1 konradm KAINOS\Domain Users 253B Sep 13 15:40 NOTICE
- -rw-r--r-- 1 konradm KAINOS\Domain Users 9.1K Sep 13 15:40 README.md
- -rw-r--r-- 1 konradm KAINOS\Domain Users 115B Sep 13 15:40 Roadmap.md
- -rw-r--r-- 1 konradm KAINOS\Domain Users 2.7K Sep 13 15:40 SECURITY-README.md
- -rw-r--r-- 1 konradm KAINOS\Domain Users 2.8K Sep 13 15:40 STYLE.md
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 15:40 _tools
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:32 alluxio
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:29 angular
- drwxr-xr-x 6 konradm KAINOS\Domain Users 204B Sep 13 16:32 bigquery
- drwxr-xr-x 12 konradm KAINOS\Domain Users 408B Sep 13 15:40 bin
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:31 cassandra
- drwxr-xr-x 15 konradm KAINOS\Domain Users 510B Sep 21 14:36 conf
- drwxr-xr-x 9 konradm KAINOS\Domain Users 306B Sep 13 15:40 dev
- drwxr-xr-x 30 konradm KAINOS\Domain Users 1.0K Sep 13 15:40 docs
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:32 elasticsearch
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:30 file
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:30 flink
- drwxr-xr-x 4 konradm KAINOS\Domain Users 136B Sep 13 15:44 geode
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:29 hbase
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:30 ignite
- drwxr-xr-x 20 root KAINOS\Domain Users 680B Sep 13 16:32 interpreter
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:30 jdbc
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:30 kylin
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:30 lens
- drwxr-xr-x 14 konradm KAINOS\Domain Users 476B Sep 13 15:40 licenses
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:29 livy
- drwxr-xr-x 4 konradm KAINOS\Domain Users 136B Sep 28 19:25 local-repo
- drwxr-xr-x 47 konradm KAINOS\Domain Users 1.6K Oct 4 21:41 logs
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:29 markdown
- drwxr-xr-x 9 konradm KAINOS\Domain Users 306B Oct 4 21:40 notebook
- -rw-r--r-- 1 konradm KAINOS\Domain Users 26K Sep 13 15:44 pom.xml
- drwxr-xr-x 6 konradm KAINOS\Domain Users 204B Sep 13 16:29 postgresql
- drwxr-xr-x 6 konradm KAINOS\Domain Users 204B Sep 13 16:30 python
- drwxr-xr-x 6 konradm KAINOS\Domain Users 204B Sep 13 15:44 r
- drwxr-xr-x 2 konradm KAINOS\Domain Users 68B Oct 4 23:31 run
- drwxr-xr-x 4 konradm KAINOS\Domain Users 136B Sep 13 15:44 scalding
- drwxr-xr-x 4 konradm KAINOS\Domain Users 136B Sep 13 15:40 scripts
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:29 shell
- drwxr-xr-x 6 konradm KAINOS\Domain Users 204B Sep 13 16:29 spark
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:28 spark-dependencies
- drwxr-xr-x 7 root KAINOS\Domain Users 238B Sep 13 16:26 target
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 15:40 testing
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Sep 13 15:47 var
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:26 zeppelin-display
- drwxr-xr-x 7 konradm KAINOS\Domain Users 238B Sep 13 16:36 zeppelin-distribution
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 15:44 zeppelin-examples
- drwxr-xr-x 6 konradm KAINOS\Domain Users 204B Sep 13 16:26 zeppelin-interpreter
- drwxr-xr-x 5 konradm KAINOS\Domain Users 170B Sep 13 16:35 zeppelin-server
- drwxr-xr-x 22 konradm KAINOS\Domain Users 748B Sep 13 16:35 zeppelin-web
- drwxr-xr-x 7 konradm KAINOS\Domain Users 238B Sep 13 16:26 zeppelin-zengine
- fireelemental:ApacheZeppelin konradm$ cd shell/
- fireelemental:shell konradm$ ll
- total 16
- -rw-r--r-- 1 konradm KAINOS\Domain Users 4.1K Sep 13 15:44 pom.xml
- drwxr-xr-x 4 konradm KAINOS\Domain Users 136B Sep 13 15:40 src
- drwxr-xr-x 15 root KAINOS\Domain Users 510B Sep 13 16:29 target
- fireelemental:shell konradm$ cd src/
- fireelemental:src konradm$ ll
- total 0
- drwxr-xr-x 4 konradm KAINOS\Domain Users 136B Sep 13 15:40 main
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Sep 13 15:40 test
- fireelemental:src konradm$ cd main/\
- >
- fireelemental:src konradm$ cd main/
- fireelemental:main konradm$ ll
- total 0
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Sep 13 15:40 java
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Sep 13 15:40 resources
- fireelemental:main konradm$ cd java
- fireelemental:java konradm$ ll
- total 0
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Sep 13 15:40 org
- fireelemental:java konradm$ cd org/apache/zeppelin/shell/
- fireelemental:shell konradm$ ll
- total 16
- -rw-r--r-- 1 konradm KAINOS\Domain Users 5.2K Sep 13 15:40 ShellInterpreter.java
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Sep 13 15:40 security
- fireelemental:shell konradm$ cd $ZEPPELIN
- $ZEPPELIN $ZEPPELIN_HOME
- fireelemental:shell konradm$ cd $ZEPPELIN
- $ZEPPELIN $ZEPPELIN_HOME
- fireelemental:shell konradm$ cd $ZEPPELIN
- -bash: cd: /Users/konradm/ApacheZeppelin/bin/zeppelin-daemon.sh: Not a directory
- fireelemental:shell konradm$ cd
- fireelemental:~ konradm$ cd ApacheZeppelin/bin
- fireelemental:bin konradm$ ll
- total 112
- -rw-r--r-- 1 konradm KAINOS\Domain Users 2.8K Sep 13 15:40 common.cmd
- -rw-r--r-- 1 konradm KAINOS\Domain Users 4.0K Sep 13 15:40 common.sh
- -rw-r--r-- 1 konradm KAINOS\Domain Users 1.2K Sep 13 15:40 functions.cmd
- -rw-r--r-- 1 konradm KAINOS\Domain Users 4.0K Sep 13 15:40 functions.sh
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 1.7K Sep 13 15:40 install-interpreter.sh
- -rw-r--r-- 1 konradm KAINOS\Domain Users 5.0K Sep 13 15:40 interpreter.cmd
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 6.0K Sep 13 15:40 interpreter.sh
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 6.6K Sep 13 15:40 zeppelin-daemon.sh
- -rw-r--r-- 1 konradm KAINOS\Domain Users 3.1K Sep 13 15:40 zeppelin.cmd
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 2.9K Sep 13 15:40 zeppelin.sh
- fireelemental:bin konradm$ more zeppelin-daemon.sh
- #!/bin/bash
- #
- # Licensed to the Apache Software Foundation (ASF) under one
- # or more contributor license agreements. See the NOTICE file
- # distributed with this work for additional information
- # regarding copyright ownership. The ASF licenses this file
- # to you under the Apache License, Version 2.0 (the
- # "License"); you may not use this file except in compliance
- # with the License. You may obtain a copy of the License at
- #
- # http://www.apache.org/licenses/LICENSE-2.0
- #
- # Unless required by applicable law or agreed to in writing, software
- # distributed under the License is distributed on an "AS IS" BASIS,
- # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- # See the License for the specific language governing permissions and
- # limitations under the License.
- #
- # description: Start and stop daemon script for.
- #
- USAGE="-e Usage: zeppelin-daemon.sh\n\t
- [--config <conf-dir>] {start|stop|upstart|restart|reload|status}\n\t
- [--version | -v]"
- if [[ "$1" == "--config" ]]; then
- shift
- conf_dir="$1"
- if [[ ! -d "${conf_dir}" ]]; then
- echo "ERROR : ${conf_dir} is not a directory"
- echo ${USAGE}
- exit 1
- else
- export ZEPPELIN_CONF_DIR="${conf_dir}"
- fi
- shift
- fi
- if [ -L ${BASH_SOURCE-$0} ]; then
- BIN=$(dirname $(readlink "${BASH_SOURCE-$0}"))
- else
- BIN=$(dirname ${BASH_SOURCE-$0})
- fi
- BIN=$(cd "${BIN}">/dev/null; pwd)
- . "${BIN}/common.sh"
- . "${BIN}/functions.sh"
- HOSTNAME=$(hostname)
- ZEPPELIN_NAME="Zeppelin"
- ZEPPELIN_LOGFILE="${ZEPPELIN_LOG_DIR}/zeppelin-${ZEPPELIN_IDENT_STRING}-${HOSTNAME}.log"
- ZEPPELIN_OUTFILE="${ZEPPELIN_LOG_DIR}/zeppelin-${ZEPPELIN_IDENT_STRING}-${HOSTNAME}.out"
- ZEPPELIN_PID="${ZEPPELIN_PID_DIR}/zeppelin-${ZEPPELIN_IDENT_STRING}-${HOSTNAME}.pid"
- ZEPPELIN_MAIN=org.apache.zeppelin.server.ZeppelinServer
- JAVA_OPTS+=" -Dzeppelin.log.file=${ZEPPELIN_LOGFILE}"
- # construct classpath
- if [[ -d "${ZEPPELIN_HOME}/zeppelin-interpreter/target/classes" ]]; then
- ZEPPELIN_CLASSPATH+=":${ZEPPELIN_HOME}/zeppelin-interpreter/target/classes"
- fi
- if [[ -d "${ZEPPELIN_HOME}/zeppelin-zengine/target/classes" ]]; then
- ZEPPELIN_CLASSPATH+=":${ZEPPELIN_HOME}/zeppelin-zengine/target/classes"
- fi
- if [[ -d "${ZEPPELIN_HOME}/zeppelin-server/target/classes" ]]; then
- ZEPPELIN_CLASSPATH+=":${ZEPPELIN_HOME}/zeppelin-server/target/classes"
- fi
- # Add jdbc connector jar
- # ZEPPELIN_CLASSPATH+=":${ZEPPELIN_HOME}/jdbc/jars/jdbc-connector-jar"
- addJarInDir "${ZEPPELIN_HOME}"
- addJarInDir "${ZEPPELIN_HOME}/lib"
- addJarInDir "${ZEPPELIN_HOME}/zeppelin-interpreter/target/lib"
- addJarInDir "${ZEPPELIN_HOME}/zeppelin-zengine/target/lib"
- addJarInDir "${ZEPPELIN_HOME}/zeppelin-server/target/lib"
- addJarInDir "${ZEPPELIN_HOME}/zeppelin-web/target/lib"
- CLASSPATH+=":${ZEPPELIN_CLASSPATH}"
- if [[ "${ZEPPELIN_NICENESS}" = "" ]]; then
- export ZEPPELIN_NICENESS=0
- fireelemental:bin konradm$ ll
- total 112
- -rw-r--r-- 1 konradm KAINOS\Domain Users 2.8K Sep 13 15:40 common.cmd
- -rw-r--r-- 1 konradm KAINOS\Domain Users 4.0K Sep 13 15:40 common.sh
- -rw-r--r-- 1 konradm KAINOS\Domain Users 1.2K Sep 13 15:40 functions.cmd
- -rw-r--r-- 1 konradm KAINOS\Domain Users 4.0K Sep 13 15:40 functions.sh
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 1.7K Sep 13 15:40 install-interpreter.sh
- -rw-r--r-- 1 konradm KAINOS\Domain Users 5.0K Sep 13 15:40 interpreter.cmd
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 6.0K Sep 13 15:40 interpreter.sh
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 6.6K Sep 13 15:40 zeppelin-daemon.sh
- -rw-r--r-- 1 konradm KAINOS\Domain Users 3.1K Sep 13 15:40 zeppelin.cmd
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 2.9K Sep 13 15:40 zeppelin.sh
- fireelemental:bin konradm$ ./zeppelin-daemon.sh start
- Zeppelin start [ OK ]
- fireelemental:bin konradm$ ./zeppelin-daemon.sh stop
- Zeppelin stop [ OK ]
- fireelemental:bin konradm$ spark-shell
- Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
- Setting default log level to "WARN".
- To adjust logging level use sc.setLogLevel(newLevel).
- 16/10/18 12:19:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- 16/10/18 12:19:43 WARN Utils: Your hostname, fireelemental.local resolves to a loopback address: 127.0.0.1, but we couldn't find any external IP address!
- 16/10/18 12:19:43 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
- 16/10/18 12:19:44 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
- Spark context Web UI available at http://127.0.0.1:4040
- Spark context available as 'sc' (master = local[*], app id = local-1476785983992).
- Spark session available as 'spark'.
- Welcome to
- ____ __
- / __/__ ___ _____/ /__
- _\ \/ _ \/ _ `/ __/ '_/
- /___/ .__/\_,_/_/ /_/\_\ version 2.0.0
- /_/
- Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
- Type in expressions to have them evaluated.
- Type :help for more information.
- scala> :q
- fireelemental:bin konradm$ sbt
- [info] Set current project to bin (in build file:/Users/konradm/ApacheZeppelin/bin/)
- > :q
- [error] Expected symbol
- [error] Not a valid command: :
- [error] :q
- [error] ^
- > fireelemental:bin konradm$
- fireelemental:bin konradm$
- fireelemental:bin konradm$ ll
- total 112
- -rw-r--r-- 1 konradm KAINOS\Domain Users 2.8K Sep 13 15:40 common.cmd
- -rw-r--r-- 1 konradm KAINOS\Domain Users 4.0K Sep 13 15:40 common.sh
- -rw-r--r-- 1 konradm KAINOS\Domain Users 1.2K Sep 13 15:40 functions.cmd
- -rw-r--r-- 1 konradm KAINOS\Domain Users 4.0K Sep 13 15:40 functions.sh
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 1.7K Sep 13 15:40 install-interpreter.sh
- -rw-r--r-- 1 konradm KAINOS\Domain Users 5.0K Sep 13 15:40 interpreter.cmd
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 6.0K Sep 13 15:40 interpreter.sh
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Oct 18 12:19 target
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 6.6K Sep 13 15:40 zeppelin-daemon.sh
- -rw-r--r-- 1 konradm KAINOS\Domain Users 3.1K Sep 13 15:40 zeppelin.cmd
- -rwxr-xr-x 1 konradm KAINOS\Domain Users 2.9K Sep 13 15:40 zeppelin.sh
- fireelemental:bin konradm$ pwd
- /Users/konradm/ApacheZeppelin/bin
- fireelemental:bin konradm$ cd
- fireelemental:~ konradm$ ll
- total 3936
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Jul 16 21:23 AndroidStudioProjects
- drwxr-xr-x 57 konradm KAINOS\Domain Users 1.9K Sep 28 19:25 ApacheZeppelin
- drwx------ 6 konradm KAINOS\Domain Users 204B Oct 16 18:16 Applications
- drwx------+ 21 konradm KAINOS\Domain Users 714B Oct 17 21:16 Desktop
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Oct 11 13:33 DockerImages
- drwx------+ 18 konradm KAINOS\Domain Users 612B Oct 14 22:39 Documents
- drwx------+ 17 konradm KAINOS\Domain Users 578B Oct 18 09:32 Downloads
- drwxr-xr-x 8 konradm KAINOS\Domain Users 272B Sep 21 12:01 IdeaProjects
- drwx------@ 62 konradm KAINOS\Domain Users 2.1K Aug 12 23:03 Library
- drwx------+ 3 konradm KAINOS\Domain Users 102B Jun 30 14:38 Movies
- drwx------+ 4 konradm KAINOS\Domain Users 136B Jul 16 21:31 Music
- drwx------+ 5 konradm KAINOS\Domain Users 170B Jul 30 11:42 Pictures
- drwxr-xr-x 13 konradm KAINOS\Domain Users 442B Oct 17 15:57 PolishMediaTextMining-ShinyVis
- drwxr-xr-x+ 5 konradm KAINOS\Domain Users 170B Jun 30 14:38 Public
- drwx------ 4 konradm KAINOS\Domain Users 136B Sep 28 18:55 VirtualBox VMs
- drwxr-xr-x 14 konradm KAINOS\Domain Users 476B Sep 8 17:11 anaconda
- -rw-r--r-- 1 konradm KAINOS\Domain Users 678B Sep 20 15:48 derby.log
- drwxr-xr-x 9 konradm KAINOS\Domain Users 306B Sep 20 15:48 metastore_db
- -rw-r--r-- 1 konradm KAINOS\Domain Users 8.5K Jul 4 16:41 npm-debug.log
- -rw-r--r-- 1 konradm KAINOS\Domain Users 920K Sep 15 10:48 rodeo.log
- -rw-r--r-- 1 konradm KAINOS\Domain Users 1.0M Sep 12 12:27 rodeo1.log
- drwxr-xr-x 6 konradm KAINOS\Domain Users 204B Jul 2 00:39 target
- fireelemental:~ konradm$ ll
- total 3936
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Jul 16 21:23 AndroidStudioProjects
- drwxr-xr-x 57 konradm KAINOS\Domain Users 1.9K Sep 28 19:25 ApacheZeppelin
- drwx------ 6 konradm KAINOS\Domain Users 204B Oct 16 18:16 Applications
- drwx------+ 21 konradm KAINOS\Domain Users 714B Oct 17 21:16 Desktop
- drwxr-xr-x 3 konradm KAINOS\Domain Users 102B Oct 11 13:33 DockerImages
- drwx------+ 18 konradm KAINOS\Domain Users 612B Oct 14 22:39 Documents
- drwx------+ 17 konradm KAINOS\Domain Users 578B Oct 18 09:32 Downloads
- drwxr-xr-x 8 konradm KAINOS\Domain Users 272B Sep 21 12:01 IdeaProjects
- drwx------@ 62 konradm KAINOS\Domain Users 2.1K Aug 12 23:03 Library
- drwx------+ 3 konradm KAINOS\Domain Users 102B Jun 30 14:38 Movies
- drwx------+ 4 konradm KAINOS\Domain Users 136B Jul 16 21:31 Music
- drwx------+ 5 konradm KAINOS\Domain Users 170B Jul 30 11:42 Pictures
- drwxr-xr-x 13 konradm KAINOS\Domain Users 442B Oct 17 15:57 PolishMediaTextMining-ShinyVis
- drwxr-xr-x+ 5 konradm KAINOS\Domain Users 170B Jun 30 14:38 Public
- drwx------ 4 konradm KAINOS\Domain Users 136B Sep 28 18:55 VirtualBox VMs
- drwxr-xr-x 14 konradm KAINOS\Domain Users 476B Sep 8 17:11 anaconda
- -rw-r--r-- 1 konradm KAINOS\Domain Users 678B Sep 20 15:48 derby.log
- drwxr-xr-x 9 konradm KAINOS\Domain Users 306B Sep 20 15:48 metastore_db
- -rw-r--r-- 1 konradm KAINOS\Domain Users 8.5K Jul 4 16:41 npm-debug.log
- -rw-r--r-- 1 konradm KAINOS\Domain Users 920K Sep 15 10:48 rodeo.log
- -rw-r--r-- 1 konradm KAINOS\Domain Users 1.0M Sep 12 12:27 rodeo1.log
- drwxr-xr-x 6 konradm KAINOS\Domain Users 204B Jul 2 00:39 target
- fireelemental:~ konradm$
- fireelemental:~ konradm$ cd
- fireelemental:~ konradm$ spark-shell
- Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
- Setting default log level to "WARN".
- To adjust logging level use sc.setLogLevel(newLevel).
- 16/10/18 12:59:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- 16/10/18 12:59:17 WARN Utils: Your hostname, fireelemental.local resolves to a loopback address: 127.0.0.1, but we couldn't find any external IP address!
- 16/10/18 12:59:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
- 16/10/18 12:59:18 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
- Spark context Web UI available at http://127.0.0.1:4040
- Spark context available as 'sc' (master = local[*], app id = local-1476788358268).
- Spark session available as 'spark'.
- Welcome to
- ____ __
- / __/__ ___ _____/ /__
- _\ \/ _ \/ _ `/ __/ '_/
- /___/ .__/\_,_/_/ /_/\_\ version 2.0.0
- /_/
- Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
- Type in expressions to have them evaluated.
- Type :help for more information.
- scala> sc.version
- res0: String = 2.0.0
- scala> :import
- 1) import spark.implicits._ (59 terms, 38 are implicit)
- 2) import spark.sql (1 terms)
- scala> spark
- res1: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@2c579202
- scala> spark.
- baseRelationToDataFrame conf createDataset emptyDataset implicits newSession read sparkContext sqlContext streams udf
- catalog createDataFrame emptyDataFrame experimental listenerManager range readStream sql stop table version
- scala> spark.
- != == conf emptyDataset experimental implicits newSession read sqlContext table wait
- ## asInstanceOf createDataFrame ensuring formatted isInstanceOf notify readStream stop toString →
- + baseRelationToDataFrame createDataset eq getClass listenerManager notifyAll sparkContext streams udf
- -> catalog emptyDataFrame equals hashCode ne range sql synchronized version
- scala> sc.
- accumulable broadcast files hadoopConfiguration makeRDD sequenceFile submitJob
- accumulableCollection cancelAllJobs getAllPools hadoopFile master setCallSite textFile
- accumulator cancelJobGroup getCheckpointDir hadoopRDD newAPIHadoopFile setCheckpointDir uiWebUrl
- addFile clearCallSite getConf isLocal newAPIHadoopRDD setJobDescription union
- addJar clearJobGroup getExecutorMemoryStatus isStopped objectFile setJobGroup version
- addSparkListener collectionAccumulator getExecutorStorageStatus jars parallelize setLocalProperty wholeTextFiles
- appName defaultMinPartitions getLocalProperty killExecutor range setLogLevel
- applicationAttemptId defaultParallelism getPersistentRDDs killExecutors register sparkUser
- applicationId deployMode getPoolForName listFiles requestExecutors startTime
- binaryFiles doubleAccumulator getRDDStorageInfo listJars runApproximateJob statusTracker
- binaryRecords emptyRDD getSchedulingMode longAccumulator runJob stop
- scala> spark.udf
- res2: org.apache.spark.sql.UDFRegistration = org.apache.spark.sql.UDFRegistration@74832504
- scala> sp
- spark spark_partition_id specialized spire split
- scala> spark.sqlContext.
- applySchema createDataFrame emptyDataFrame implicits jsonRDD parquetFile setConf streams udf
- baseRelationToDataFrame createDataset experimental isCached listenerManager range sparkContext table uncacheTable
- cacheTable createExternalTable getAllConfs jdbc load read sparkSession tableNames
- clearCache dropTempTable getConf jsonFile newSession readStream sql tables
- scala> spark.read.
- csv format jdbc json load option options orc parquet schema table text textFile
- scala> spark.read.
- != + == csv eq format getClass isInstanceOf json ne notifyAll options parquet synchronized text toString →
- ## -> asInstanceOf ensuring equals formatted hashCode jdbc load notify option orc schema table textFile wait
- scala> spark.read.csv.
- csv format jdbc json load option options orc parquet schema table text textFile
- scala> spark.read.csv.
- != + == csv eq format getClass isInstanceOf json ne notifyAll options parquet synchronized text toString →
- ## -> asInstanceOf ensuring equals formatted hashCode jdbc load notify option orc schema table textFile wait
- scala> spark.read.csv
- def csv(paths: String*): org.apache.spark.sql.DataFrame def csv(path: String): org.apache.spark.sql.DataFrame
- scala> spark.read.csv.
- csv format jdbc json load option options orc parquet schema table text textFile
- scala> spark.read.csv.
- != + == csv eq format getClass isInstanceOf json ne notifyAll options parquet synchronized text toString →
- ## -> asInstanceOf ensuring equals formatted hashCode jdbc load notify option orc schema table textFile wait
- scala> spark.read.csv
- def csv(paths: String*): org.apache.spark.sql.DataFrame def csv(path: String): org.apache.spark.sql.DataFrame
- scala> spark.read.csv("/Users/konradm/Desktop/people.csv")
- res3: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]
- scala> val frm = spark.read.csv("/Users/konradm/Desktop/people.csv")
- frm: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]
- scala> frm.show
- +---+------+----+
- |_c0| _c1| _c2|
- +---+------+----+
- | id| name| age|
- | 0| Jacek| 42|
- | 1|Unmesh| 10|
- | 2|Ulrich| 12|
- | 3|Ulrich|1220|
- | 4|Ullash|1220|
- +---+------+----+
- scala> val frm = spark.read.option("header", true).csv("/Users/konradm/Desktop/people.csv")
- frm: org.apache.spark.sql.DataFrame = [id: string, name: string ... 1 more field]
- scala> frm.show
- +---+------+----+
- | id| name| age|
- +---+------+----+
- | 0| Jacek| 42|
- | 1|Unmesh| 10|
- | 2|Ulrich| 12|
- | 3|Ulrich|1220|
- | 4|Ullash|1220|
- +---+------+----+
- scala> select * from frm
- <console>:24: error: not found: value select
- select * from frm
- ^
- <console>:24: error: not found: value from
- select * from frm
- ^
- scala> frm.create
- createOrReplaceTempView createTempView
- scala> frm.createOrReplaceTempView("mojCSV")
- scala> sql("select * from mojCSV").show
- +---+------+----+
- | id| name| age|
- +---+------+----+
- | 0| Jacek| 42|
- | 1|Unmesh| 10|
- | 2|Ulrich| 12|
- | 3|Ulrich|1220|
- | 4|Ullash|1220|
- +---+------+----+
- scala> case class Person(id: Long, name: String, age: Int)
- defined class Person
- scala> frm.filter(p: Person => p.id == 2)
- <console>:1: error: identifier expected but integer literal found.
- frm.filter(p: Person => p.id == 2)
- ^
- scala> frm.filter{p: Person => p.id == 2}
- <console>:28: error: type mismatch;
- found : Person => Boolean
- required: org.apache.spark.sql.Row => Boolean
- frm.filter{p: Person => p.id == 2}
- ^
- scala> sql("select * from mojCSV").as[Person]
- org.apache.spark.sql.AnalysisException: Cannot up cast mojcsv.`id` from string to bigint as it may truncate
- The type path of the target object is:
- - field (class: "scala.Long", name: "id")
- - root class: "Person"
- You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;
- at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:1990)
- at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$34$$anonfun$applyOrElse$14.applyOrElse(Analyzer.scala:2020)
- at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$34$$anonfun$applyOrElse$14.applyOrElse(Analyzer.scala:2007)
- at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
- at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
- at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
- at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278)
- at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
- at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
- at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
- at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
- at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
- at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
- at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
- at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
- at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:350)
- at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
- at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
- at scala.collection.immutable.List.foreach(List.scala:381)
- at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
- at scala.collection.immutable.List.map(List.scala:285)
- at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:348)
- at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
- at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
- at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
- at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156)
- at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166)
- at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175)
- at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
- at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175)
- at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:144)
- at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$34.applyOrElse(Analyzer.scala:2007)
- at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$34.applyOrElse(Analyzer.scala:2003)
- at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
- at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
- at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
- at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:60)
- at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.apply(Analyzer.scala:2003)
- at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.apply(Analyzer.scala:1988)
- at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
- at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
- at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
- at scala.collection.immutable.List.foldLeft(List.scala:84)
- at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
- at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
- at scala.collection.immutable.List.foreach(List.scala:381)
- at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
- at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolveAndBind(ExpressionEncoder.scala:244)
- at org.apache.spark.sql.Dataset.<init>(Dataset.scala:210)
- at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
- at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59)
- at org.apache.spark.sql.Dataset.as(Dataset.scala:359)
- ... 48 elided
- scala> val frm = spark.read.option("header", true).option("inferSchema", true).csv("/Users/konradm/Desktop/people.csv")
- frm: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
- scala> frm.printSchema
- root
- |-- id: integer (nullable = true)
- |-- name: string (nullable = true)
- |-- age: integer (nullable = true)
- scala> frm.as[Person]
- res12: org.apache.spark.sql.Dataset[Person] = [id: int, name: string ... 1 more field]
- scala> frm.filter(p => p.int == 2)
- <console>:26: error: value int is not a member of org.apache.spark.sql.Row
- frm.filter(p => p.int == 2)
- ^
- scala> frm.filter(p => p.id == 2)
- <console>:26: error: value id is not a member of org.apache.spark.sql.Row
- frm.filter(p => p.id == 2)
- ^
- scala> frm.where
- def where(conditionExpr: String): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- def where(condition: org.apache.spark.sql.Column): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- scala> frm.filter
- def filter(func: org.apache.spark.api.java.function.FilterFunction[org.apache.spark.sql.Row]): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- def filter(func: org.apache.spark.sql.Row => Boolean): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- def filter(conditionExpr: String): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- def filter(condition: org.apache.spark.sql.Column): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- scala> frm.filter(p: Person => p.id == 2)
- <console>:1: error: identifier expected but integer literal found.
- frm.filter(p: Person => p.id == 2)
- ^
- scala> frm.filter(p: Person => p.id === 2)
- <console>:1: error: identifier expected but integer literal found.
- frm.filter(p: Person => p.id === 2)
- ^
- scala> frm.filter(p: Person => p.id = 2)
- <console>:1: error: ')' expected but '=' found.
- frm.filter(p: Person => p.id = 2)
- ^
- scala> frm.filter(p: Person => p.id == 2)
- <console>:1: error: identifier expected but integer literal found.
- frm.filter(p: Person => p.id == 2)
- ^
- scala> frm
- res15: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
- scala> frm.printSchema
- root
- |-- id: integer (nullable = true)
- |-- name: string (nullable = true)
- |-- age: integer (nullable = true)
- scala> frm.filter
- def filter(func: org.apache.spark.api.java.function.FilterFunction[org.apache.spark.sql.Row]): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- def filter(func: org.apache.spark.sql.Row => Boolean): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- def filter(conditionExpr: String): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- def filter(condition: org.apache.spark.sql.Column): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
- scala> frm.filter($"age" > 40)
- res17: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: int, name: string ... 1 more field]
- scala> frm.filter($"age" > 40).show
- +---+------+----+
- | id| name| age|
- +---+------+----+
- | 0| Jacek| 42|
- | 3|Ulrich|1220|
- | 4|Ullash|1220|
- +---+------+----+
- scala> frm.filter($"age".> 40).show
- <console>:1: error: ')' expected but integer literal found.
- frm.filter($"age".> 40).show
- ^
- scala> frm.filter($"age" > 40).show
- +---+------+----+
- | id| name| age|
- +---+------+----+
- | 0| Jacek| 42|
- | 3|Ulrich|1220|
- | 4|Ullash|1220|
- +---+------+----+
- scala> frm.filter('age > 40).show
- +---+------+----+
- | id| name| age|
- +---+------+----+
- | 0| Jacek| 42|
- | 3|Ulrich|1220|
- | 4|Ullash|1220|
- +---+------+----+
- scala> frm.filter(_.age > 40).show
- <console>:26: error: value age is not a member of org.apache.spark.sql.Row
- frm.filter(_.age > 40).show
- ^
- scala> frm.filter(p => p.age > 40).show
- <console>:26: error: value age is not a member of org.apache.spark.sql.Row
- frm.filter(p => p.age > 40).show
- ^
- scala> frm
- res23: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
- scala> frm.as[Person].filter(_.age > 40)
- res24: org.apache.spark.sql.Dataset[Person] = [id: int, name: string ... 1 more field]
- scala> frm.as[Person].filter(p => p.age > 40)
- res25: org.apache.spark.sql.Dataset[Person] = [id: int, name: string ... 1 more field]
- scala> frm.as[Person].groupBy(p => p.age > 40).show
- <console>:28: error: missing parameter type
- frm.as[Person].groupBy(p => p.age > 40).show
- ^
- scala> frm.as[Person].groupByKey
- def groupByKey[K](func: org.apache.spark.api.java.function.MapFunction[Person,K],encoder: org.apache.spark.sql.Encoder[K]): org.apache.spark.sql.KeyValueGroupedDataset[K,Person]
- def groupByKey[K](func: Person => K)(implicit evidence$4: org.apache.spark.sql.Encoder[K]): org.apache.spark.sql.KeyValueGroupedDataset[K,Person]
- scala> frm.as[Person].groupByKey(_.age > 40).show
- <console>:28: error: value show is not a member of org.apache.spark.sql.KeyValueGroupedDataset[Boolean,Person]
- frm.as[Person].groupByKey(_.age > 40).show
- ^
- scala> frm.as[Person].groupByKey(_.age > 40)
- res28: org.apache.spark.sql.KeyValueGroupedDataset[Boolean,Person] = org.apache.spark.sql.KeyValueGroupedDataset@3a66ac76
- scala> frm.as[Person].groupByKey(_.age > 40).count
- res29: org.apache.spark.sql.Dataset[(Boolean, Long)] = [value: boolean, count(1): bigint]
- scala> frm.as[Person].groupByKey(_.age > 40).count.show
- +-----+--------+
- |value|count(1)|
- +-----+--------+
- | true| 3|
- |false| 2|
- +-----+--------+
- scala> frm.as[Person].reduceByKey(_.age > 40).count.show
- <console>:28: error: value reduceByKey is not a member of org.apache.spark.sql.Dataset[Person]
- frm.as[Person].reduceByKey(_.age > 40).count.show
- ^
- scala> frm.as[Person].groupByKey(_.age > 40).count.show
- +-----+--------+
- |value|count(1)|
- +-----+--------+
- | true| 3|
- |false| 2|
- +-----+--------+
- scala> frm.as[Person].groupByKey(identity).count.show
- +---------------+--------+
- | key|count(1)|
- +---------------+--------+
- | [1,Unmesh,10]| 1|
- |[3,Ulrich,1220]| 1|
- | [0,Jacek,42]| 1|
- | [2,Ulrich,12]| 1|
- |[4,Ullash,1220]| 1|
- +---------------+--------+
- scala> frm.groupBy().count
- res34: org.apache.spark.sql.DataFrame = [count: bigint]
- scala> frm.groupBy().count.show
- +-----+
- |count|
- +-----+
- | 5|
- +-----+
- scala> frm.groupBy().count.expl
- explain explode
- scala> frm.groupBy().count.explain
- == Physical Plan ==
- *HashAggregate(keys=[], functions=[count(1)])
- +- Exchange SinglePartition
- +- *HashAggregate(keys=[], functions=[partial_count(1)])
- +- *Scan csv [] Format: CSV, InputPaths: file:/Users/konradm/Desktop/people.csv, PushedFilters: [], ReadSchema: struct<>
- scala> frm.groupBy().count.explain(extended = true)
- == Parsed Logical Plan ==
- Aggregate [count(1) AS count#230L]
- +- Relation[id#58,name#59,age#60] csv
- == Analyzed Logical Plan ==
- count: bigint
- Aggregate [count(1) AS count#230L]
- +- Relation[id#58,name#59,age#60] csv
- == Optimized Logical Plan ==
- Aggregate [count(1) AS count#230L]
- +- Project
- +- Relation[id#58,name#59,age#60] csv
- == Physical Plan ==
- *HashAggregate(keys=[], functions=[count(1)], output=[count#230L])
- +- Exchange SinglePartition
- +- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#235L])
- +- *Scan csv [] Format: CSV, InputPaths: file:/Users/konradm/Desktop/people.csv, PushedFilters: [], ReadSchema: struct<>
- scala> // frm.as[Person].groupByKey(_.age > 40).count.show
- scala> val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
- <console>:23: error: not found: value Window
- val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
- ^
- scala> import org.apache.spark.sql.
- AnalysisException DataFrameReader Encoder KeyValueGroupedDataset SQLContext TypedColumn execution jdbc types
- Column DataFrameStatFunctions Encoders RelationalGroupedDataset SQLImplicits UDFRegistration expressions package util
- ColumnName DataFrameWriter ExperimentalMethods Row SaveMode api functions sources
- DataFrame Dataset ForeachWriter RowFactory SparkSession catalog hive streaming
- DataFrameNaFunctions DatasetHolder InternalOutputModes RuntimeConfig Strategy catalyst internal test
- scala> import org.apache.
- avro commons derby hadoop html http jute mesos parquet thrift xbean xml
- calcite curator directory hive htrace ivy log4j oro spark wml xerces zookeeper
- scala> import org.apache.spark.
- Accumulable GrowableAccumulableParam RangeDependency SparkException TaskEndReason deploy repl
- AccumulableParam HashPartitioner RangePartitioner SparkExecutorInfo TaskFailedReason executor rpc
- Accumulator InternalAccumulator Resubmitted SparkExecutorInfoImpl TaskKilled graphx scheduler
- AccumulatorParam InterruptibleIterator SPARK_BRANCH SparkFiles TaskKilledException input security
- Aggregator JavaFutureActionWrapper SPARK_BUILD_DATE SparkFirehoseListener TaskNotSerializableException internal serializer
- CleanerListener JobExecutionStatus SPARK_BUILD_USER SparkJobInfo TaskResultLost io shuffle
- CleanupTask JobSubmitter SPARK_REPO_URL SparkJobInfoImpl TaskSchedulerIsSet launcher sql
- CleanupTaskWeakReference MapOutputStatistics SPARK_REVISION SparkMasterRegex TaskState mapred status
- ComplexFutureAction MapOutputTrackerMaster SPARK_VERSION SparkStageInfo TestUtils memory storage
- Dependency MapOutputTrackerMasterEndpoint SerializableWritable SparkStageInfoImpl ThrowableSerializationWrapper metrics streaming
- ExceptionFailure MapOutputTrackerMessage ShuffleDependency SparkStatusTracker UnknownReason ml tags
- ExecutorAllocationClient MapOutputTrackerWorker SimpleFutureAction SpillListener WritableConverter mllib ui
- ExecutorLostFailure NarrowDependency SparkConf StopMapOutputTracker WritableFactory network unsafe
- ExpireDeadHosts OneToOneDependency SparkContext Success annotation package unused
- FetchFailed Partition SparkDriverExecutionException TaskCommitDenied api partial util
- FutureAction Partitioner SparkEnv TaskContext broadcast rdd
- scala> import org.apache.spark.sql.expressions.Window
- Window WindowSpec
- scala> import org.apache.spark.sql.expressions
- import org.apache.spark.sql.expressions
- scala> val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
- <console>:24: error: not found: value Window
- val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
- ^
- scala> import org.apache.spark.sql.expressions._
- import org.apache.spark.sql.expressions._
- scala> val windowSpec = Window.partitionBy($"age" > 40).orderBy("age")
- windowSpec: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@8ad08cc
- scala> people.withColumn("rank", rank over windowSpec).filter($"rank" === 1).select('name, 'age).show
- <console>:30: error: not found: value people
- people.withColumn("rank", rank over windowSpec).filter($"rank" === 1).select('name, 'age).show
- ^
- scala> frm.withColumn("rank", rank over windowSpec).filter($"rank" === 1).select('name, 'age).show
- +------+---+
- | name|age|
- +------+---+
- | Jacek| 42|
- |Unmesh| 10|
- +------+---+
- scala> import org.apache.spark.ml._
- import org.apache.spark.ml._
- scala> org.apache.spark.ml.
- Estimator PipelineModel Predictor UnaryTransformer classification feature optim python regression tree
- Model PipelineStage PredictorParams ann clustering impl package r source tuning
- Pipeline PredictionModel Transformer attribute evaluation linalg param recommendation stat util
- scala> org.apache.spark.ml.tree.
- CategoricalSplit DecisionTreeModelReadWrite GBTParams LeafNode RandomForestRegressionModelParams impl
- ContinuousSplit DecisionTreeParams HasFeatureSubsetStrategy Node RandomForestRegressorParams
- DecisionTreeClassifierParams DecisionTreeRegressorParams HasNumTrees RandomForestClassificationModelParams Split
- DecisionTreeModel EnsembleModelReadWrite InternalNode RandomForestClassifierParams TreeEnsembleParams
- scala> import org.apache.spark.ml.Transformer
- import org.apache.spark.ml.Transformer
- scala> //2 transformers
- scala> //tokenizer
- scala> //hashing tf
- scala> import org.apache.spark.ml.feature._
- import org.apache.spark.ml.feature._
- scala> val tok = new Tokenizer()
- tok: org.apache.spark.ml.feature.Tokenizer = tok_680b50261bfe
- scala> val emails = Seq(
- | Display all 759 possibilities? (y or n)
- | (0, "This is NOT a spam", 1)
- | (1, "Hey Jacek, Wanna salary rise?", 0),
- | (2, "No i jak Pawel, dobrze sie bawisz?", 0),
- |
- |
- You typed two blank lines. Starting a new command.
- scala> val emails = Seq(
- | (0, "this is NOT a spam", 1),
- | (1, "Hey Jacek, Wanna salaary rise?", 0),
- | (2, "No i jak Pawel, dobrze sie bawisz?", 0),
- | (3, "SPAM VIAGRA ENLARGE P..S", 1)).toDF("id", "email", "label")
- emails: org.apache.spark.sql.DataFrame = [id: int, email: string ... 1 more field]
- scala> emails.show
- +---+--------------------+-----+
- | id| email|label|
- +---+--------------------+-----+
- | 0| this is NOT a spam| 1|
- | 1|Hey Jacek, Wanna ...| 0|
- | 2|No i jak Pawel, d...| 0|
- | 3|SPAM VIAGRA ENLAR...| 1|
- +---+--------------------+-----+
- scala> emails.show(false)
- +---+----------------------------------+-----+
- |id |email |label|
- +---+----------------------------------+-----+
- |0 |this is NOT a spam |1 |
- |1 |Hey Jacek, Wanna salaary rise? |0 |
- |2 |No i jak Pawel, dobrze sie bawisz?|0 |
- |3 |SPAM VIAGRA ENLARGE P..S |1 |
- +---+----------------------------------+-----+
- scala> tok.transform
- transform transformSchema
- scala> tok.transform
- override def transform(dataset: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame
- def transform(dataset: org.apache.spark.sql.Dataset[_],paramMap: org.apache.spark.ml.param.ParamMap): org.apache.spark.sql.DataFrame
- def transform(dataset: org.apache.spark.sql.Dataset[_],firstParamPair: org.apache.spark.ml.param.ParamPair[_],otherParamPairs: org.apache.spark.ml.param.ParamPair[_]*): org.apache.spark.sql.DataFrame
- scala> tok.transform
- override def transform(dataset: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame
- def transform(dataset: org.apache.spark.sql.Dataset[_],paramMap: org.apache.spark.ml.param.ParamMap): org.apache.spark.sql.DataFrame
- def transform(dataset: org.apache.spark.sql.Dataset[_],firstParamPair: org.apache.spark.ml.param.ParamPair[_],otherParamPairs: org.apache.spark.ml.param.ParamPair[_]*): org.apache.spark.sql.DataFrame
- scala> tok.transform(emails)
- java.util.NoSuchElementException: Failed to find a default value for inputCol
- at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:660)
- at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:660)
- at scala.Option.getOrElse(Option.scala:121)
- at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:659)
- at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
- at org.apache.spark.ml.param.Params$class.$(params.scala:664)
- at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
- at org.apache.spark.ml.UnaryTransformer.transformSchema(Transformer.scala:109)
- at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:70)
- at org.apache.spark.ml.UnaryTransformer.transform(Transformer.scala:120)
- ... 54 elided
- scala> tok.setInputCol("email").transform(emails)
- res43: org.apache.spark.sql.DataFrame = [id: int, email: string ... 2 more fields]
- scala> tok.setInputCol("email").transform(emails).show(false)
- +---+----------------------------------+-----+------------------------------------------+
- |id |email |label|tok_680b50261bfe__output |
- +---+----------------------------------+-----+------------------------------------------+
- |0 |this is NOT a spam |1 |[this, is, not, a, spam] |
- |1 |Hey Jacek, Wanna salaary rise? |0 |[hey, jacek,, wanna, salaary, rise?] |
- |2 |No i jak Pawel, dobrze sie bawisz?|0 |[no, i, jak, pawel,, dobrze, sie, bawisz?]|
- |3 |SPAM VIAGRA ENLARGE P..S |1 |[spam, viagra, enlarge, p..s] |
- +---+----------------------------------+-----+------------------------------------------+
- scala> val tokens = tok.setInputCol("email").transform(emails).show(false)
- +---+----------------------------------+-----+------------------------------------------+
- |id |email |label|tok_680b50261bfe__output |
- +---+----------------------------------+-----+------------------------------------------+
- |0 |this is NOT a spam |1 |[this, is, not, a, spam] |
- |1 |Hey Jacek, Wanna salaary rise? |0 |[hey, jacek,, wanna, salaary, rise?] |
- |2 |No i jak Pawel, dobrze sie bawisz?|0 |[no, i, jak, pawel,, dobrze, sie, bawisz?]|
- |3 |SPAM VIAGRA ENLARGE P..S |1 |[spam, viagra, enlarge, p..s] |
- +---+----------------------------------+-----+------------------------------------------+
- tokens: Unit = ()
- scala> tokens.printSchema
- <console>:41: error: value printSchema is not a member of Unit
- tokens.printSchema
- ^
- scala> tokens
- scala> val tokens = tok.setInputCol("email").transform(emails)
- tokens: org.apache.spark.sql.DataFrame = [id: int, email: string ... 2 more fields]
- scala> tokens
- res47: org.apache.spark.sql.DataFrame = [id: int, email: string ... 2 more fields]
- scala> tokens.printSchema
- root
- |-- id: integer (nullable = false)
- |-- email: string (nullable = true)
- |-- label: integer (nullable = false)
- |-- tok_680b50261bfe__output: array (nullable = true)
- | |-- element: string (containsNull = true)
- scala> val hashTF = new HashingTF().setInputCol(tok.getOutputCol).setOutputCol("features")
- hashTF: org.apache.spark.ml.feature.HashingTF = hashingTF_bb9390313079
- scala> hashTF
- res49: org.apache.spark.ml.feature.HashingTF = hashingTF_bb9390313079
- scala> hashTF.transform
- transform transformSchema
- scala> hashTF.transform
- override def transform(dataset: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame
- def transform(dataset: org.apache.spark.sql.Dataset[_],paramMap: org.apache.spark.ml.param.ParamMap): org.apache.spark.sql.DataFrame
- def transform(dataset: org.apache.spark.sql.Dataset[_],firstParamPair: org.apache.spark.ml.param.ParamPair[_],otherParamPairs: org.apache.spark.ml.param.ParamPair[_]*): org.apache.spark.sql.DataFrame
- scala> hashTF.transform(tokens)
- res50: org.apache.spark.sql.DataFrame = [id: int, email: string ... 3 more fields]
- scala> hashTF.transform(tokens).show
- +---+--------------------+-----+------------------------+--------------------+
- | id| email|label|tok_680b50261bfe__output| features|
- +---+--------------------+-----+------------------------+--------------------+
- | 0| this is NOT a spam| 1| [this, is, not, a...|(262144,[15889,10...|
- | 1|Hey Jacek, Wanna ...| 0| [hey, jacek,, wan...|(262144,[25736,66...|
- | 2|No i jak Pawel, d...| 0| [no, i, jak, pawe...|(262144,[24417,53...|
- | 3|SPAM VIAGRA ENLAR...| 1| [spam, viagra, en...|(262144,[4428,137...|
- +---+--------------------+-----+------------------------+--------------------+
- scala> val hashed = hashTF.transform(tokens)
- hashed: org.apache.spark.sql.DataFrame = [id: int, email: string ... 3 more fields]
- scala> hashed.printSchema
- root
- |-- id: integer (nullable = false)
- |-- email: string (nullable = true)
- |-- label: integer (nullable = false)
- |-- tok_680b50261bfe__output: array (nullable = true)
- | |-- element: string (containsNull = true)
- |-- features: vector (nullable = true)
- scala> import org.apache.spark.ml.classification._
- import org.apache.spark.ml.classification._
- scala> val logReg = new Logistic
- LogisticAggregator LogisticRegression LogisticRegressionParams LogisticRegressionTrainingSummary
- LogisticCostFun LogisticRegressionModel LogisticRegressionSummary
- scala> val logReg = new LogisticRegression
- LogisticRegression LogisticRegressionModel LogisticRegressionParams LogisticRegressionSummary LogisticRegressionTrainingSummary
- scala> val logReg = new LogisticRegression()
- logReg: org.apache.spark.ml.classification.LogisticRegression = logreg_e6ccb54d045e
- scala> val model = logReg.fit(hashed)
- 16/10/18 16:27:33 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
- 16/10/18 16:27:33 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
- model: org.apache.spark.ml.classification.LogisticRegressionModel = logreg_e6ccb54d045e
- scala> model
- res53: org.apache.spark.ml.classification.LogisticRegressionModel = logreg_e6ccb54d045e
- scala> model.
- clear featuresCol getMaxIter getThreshold intercept parent setParent threshold weightCol
- coefficients fitIntercept getOrDefault getThresholds isDefined predictionCol setPredictionCol thresholds write
- copy get getParam getTol isSet probabilityCol setProbabilityCol toString
- elasticNetParam getDefault getPredictionCol getWeightCol labelCol rawPredictionCol setRawPredictionCol tol
- evaluate getElasticNetParam getProbabilityCol hasDefault maxIter regParam setThreshold transform
- explainParam getFeaturesCol getRawPredictionCol hasParam numClasses save setThresholds transformSchema
- explainParams getFitIntercept getRegParam hasParent numFeatures set standardization uid
- extractParamMap getLabelCol getStandardization hasSummary params setFeaturesCol summary validateParams
- scala> model.transform
- transform transformSchema
- scala> model.transform
- override def transform(dataset: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame
- def transform(dataset: org.apache.spark.sql.Dataset[_],paramMap: org.apache.spark.ml.param.ParamMap): org.apache.spark.sql.DataFrame
- def transform(dataset: org.apache.spark.sql.Dataset[_],firstParamPair: org.apache.spark.ml.param.ParamPair[_],otherParamPairs: org.apache.spark.ml.param.ParamPair[_]*): org.apache.spark.sql.DataFrame
- scala> model.transform(hashed).select('label, 'prediction).show
- +-----+----------+
- |label|prediction|
- +-----+----------+
- | 1| 1.0|
- | 0| 0.0|
- | 0| 0.0|
- | 1| 1.0|
- +-----+----------+
- scala> //microsoft hdinsight
- scala> val pipeline = new Pipeline().setStages(Array(tok, hashTF, logReg))
- pipeline: org.apache.spark.ml.Pipeline = pipeline_e47df7c4f800
- scala> val model = pipeline.fit(emails)
- model: org.apache.spark.ml.PipelineModel = pipeline_e47df7c4f800
- scala> val testing = Seq(
- | "To nie jest SPAM",
- | "To jest").toDF("email")
- testing: org.apache.spark.sql.DataFrame = [email: string]
- scala> model.transform(testing).show(false)
- +----------------+------------------------+--------------------------------------------------------+------------------------------------------+-----------------------------------------+----------+
- |email |tok_680b50261bfe__output|features |rawPrediction |probability |prediction|
- +----------------+------------------------+--------------------------------------------------------+------------------------------------------+-----------------------------------------+----------+
- |To nie jest SPAM|[to, nie, jest, spam] |(262144,[152658,197793,205044,215930],[1.0,1.0,1.0,1.0])|[-6.102587858518542,6.102587858518542] |[0.002232077682682062,0.9977679223173178]|1.0 |
- |To jest |[to, jest] |(262144,[205044,215930],[1.0,1.0]) |[-0.24283585831430904,0.24283585831430904]|[0.4395876168221684,0.5604123831778316] |1.0 |
- +----------------+------------------------+--------------------------------------------------------+------------------------------------------+-----------------------------------------+----------+
- scala> model.transform(testing).select('email, 'prediction).show(false)
- +----------------+----------+
- |email |prediction|
- +----------------+----------+
- |To nie jest SPAM|1.0 |
- |To jest |1.0 |
- +----------------+----------+
- scala> model.save.
- clear explainParam extractParamMap getDefault getParam hasParam isDefined params save setParent toString transformSchema validateParams
- copy explainParams get getOrDefault hasDefault hasParent isSet parent set stages transform uid write
- scala> model.write.
- context overwrite save session
- scala> model.write.save("moj-model-sopocki")
- SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
- SLF4J: Defaulting to no-operation (NOP) logger implementation
- SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
- scala> //jpmml
- scala>
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement