Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Prerequisites :
- >>Enable hive interactive server in hive
- >>Get following details from hive for spark
- spark.hadoop.hive.llap.daemon.service.hosts @llap0
- spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
- spark.datasource.hive.warehouse.load.staging.dir /tmp
- spark.datasource.hive.warehouse.metastoreUri thrift://c420-node3.squadron-labs.com:9083
- spark.security.credentials.hiveserver2.enabled false
- Basic testing :
- 1) Create a table employee in hive and load some data
- eg:
- Create table
- ----------------
- ```CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String)
- COMMENT 'Employee details'
- ROW FORMAT DELIMITED
- FIELDS TERMINATED BY ','
- LINES TERMINATED BY '\n'
- STORED AS TEXTFILE;
- ```
- Load data data.txt file into hdfs
- ---------------
- ```
- 1201,Gopal,45000,Technical manager
- 1202,Manisha,45000,Proof reader
- 1203,Masthanvali,40000,Technical writer
- 1204,Kiran,40000,Hr Admin
- 1205,Kranthi,30000,Op Admin
- ```
- ```LOAD DATA INPATH '/tmp/data.txt' OVERWRITE INTO TABLE employee;```
- 2) kinit to the spark user and run
- spark-shell --master yarn --conf "spark.security.credentials.hiveserver2.enabled=false" --conf "spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/_HOST@HWX.COM" --conf "spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083" --conf "spark.datasource.hive.warehouse.load.staging.dir=/tmp/" --conf "spark.hadoop.hive.llap.daemon.service.hosts=@llap0" --conf "spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar
- Note: spark.security.credentials.hiveserver2.enabled should be set to false for YARN client deploy mode, and true for YARN cluster deploy mode (by default). This configuration is required for a Kerberized cluster
- 3) run following code in scala shell to view the table data
- import com.hortonworks.hwc.HiveWarehouseSession
- val hive = HiveWarehouseSession.session(spark).build()
- hive.execute("show tables").show
- hive.executeQuery("select * from employee").show
- 4) To apply common properties by default, add following setting into ambari spark2 custom conf
- spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/_HOST@HWX.COM
- spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083
- spark.datasource.hive.warehouse.load.staging.dir=/tmp/
- spark.hadoop.hive.llap.daemon.service.hosts=@llap0
- spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181
- 5) spark-shell --master yarn --conf "spark.security.credentials.hiveserver2.enabled=false" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar
- Note: Common properties are read from spark default properties
- 6) run following code in scala shell to view the hive table data
- ```
- import com.hortonworks.hwc.HiveWarehouseSession
- val hive = HiveWarehouseSession.session(spark).build()
- hive.execute("show tables").show
- hive.executeQuery("select * from employee").show
- ```
- 7) To integrate HWC in Livy2
- a) add following property in Custom livy2-conf
- livy.file.local-dir-whitelist=/usr/hdp/current/hive_warehouse_connector/
- b) Add hive-site.xml to /usr/hdp/current/spark2-client/conf on all cluster nodes.
- c) In livy2 interpreter settings add following
- livy.spark.hadoop.hive.llap.daemon.service.hosts @llap0
- livy.spark.security.credentials.hiveserver2.enabled true
- livy.spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
- livy.spark.sql.hive.hiveserver2.jdbc.url.principal hive/_HOST@HWX.COM
- livy.spark.yarn.security.credentials.hiveserver2.enabled true
- livy.spark.jars file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar
- d) Restart livy2 interpreter
- e) in first paragraph add
- %livy2
- import com.hortonworks.hwc.HiveWarehouseSession
- val hive = HiveWarehouseSession.session(spark).build()
- f) in second paragraph add
- %livy2
- hive.executeQuery("select * from employee").show
Add Comment
Please, Sign In to add comment