Guest User

Untitled

a guest
Nov 4th, 2017
421
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.74 KB | None | 0 0
  1. Step 1: Login into hortonworks sandbox
  2. Ssh root@127.0.0.1 -p 2222
  3. Password:maria_dev
  4.  
  5. Step 2: Create 2 Hive tables with same structure as users table (inside test2 database) on MySql - users_orc and users_tmp (use beeline or Hive View via Ambari)
  6. create table users_orc (id BIGINT, email STRING, name STRING, create_date STRING, update_date STRING)
  7. CLUSTERED BY (id) into 5 buckets
  8. stored as ORC
  9. TBLPROPERTIES ("transactional"="true");
  10.  
  11. create table users_tmp (id BIGINT, email STRING, name STRING, create_date STRING, update_date STRING)
  12. CLUSTERED BY (id) into 5 buckets
  13. stored as ORC
  14. TBLPROPERTIES ("transactional"="true");
  15.  
  16. Step 3: Start spark shell with mysql connector
  17. cd /usr/hdp/2.6/spark2/bin
  18. ./spark-shell --jars ./../jars/mysql-connector-java-5.1.44-bin.jar
  19. --Spark script:
  20. import org.apache.spark.sql.hive.HiveContext;
  21. val mysql_props = new java.util.Properties;
  22. mysql_props.setProperty("user","root");
  23. mysql_props.setProperty("password","Clair2016");
  24. val connection="jdbc:mysql://10.0.2.2:3306/test2";
  25. val sqlContext = new HiveContext(sc);
  26. val users=sqlContext.read.format("orc").jdbc(connection,"users",mysql_props);
  27. users.show;
  28. users.write.format("orc").mode("overwrite").saveAsTable("users_table");
  29. val hiveCtx = new HiveContext(sc)
  30. val topRows=hiveCtx.sql("SELECT id, email, name FROM users_table LIMIT 10");
  31. topRows.show;
  32. hiveCtx.sql("insert overwrite table users_tmp (select * from users_table)");
  33.  
  34. Step 4: Do merge in Hive (use beeline or Hive View)
  35. set hive.enforce.bucketing = true;
  36. merge into users_orc using (select * from users_tmp) sub on sub.id=users_orc.id
  37. when matched then update set name=sub.name, email=sub.email, update_date=sub.update_date
  38. when not matched then insert values (sub.id, sub.email, sub.name, sub.create_date, sub.update_date);
Add Comment
Please, Sign In to add comment