Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Practice Problem:
- > Using sqoop, import orders table into hdfs to folders /user/cloudera/problem1/orders. Use any file type and compression codec of your choice.
- > Using sqoop, import order_items table into hdfs to folders /user/cloudera/problem1/order-items. Use any file type and compression codec of your choice.
- > Using Spark Scala load data at /user/cloudera/problem1/orders and /user/cloudera/problem1/orders-items items as dataframes.
- > Find total orders and total amount per status per day. The result should be sorted by order date in descending, order status in ascending and total amount in ascending and total orders in ascending.
- > Use sparkSQL and spark dataframe API to solve this
- > Save the result as csv and load the csv back into mysql db in a table named result-a
- # SOLUTION:
- ```
- sqoop eval \
- --connect 'jdbc:mysql://localhost:3306/retail_db' \
- --username root \
- --password cloudera \
- --query 'select * from orders'
- sqoop import \
- --connect 'jdbc:mysql://localhost:3306/retail_db' \
- --username root \
- --password cloudera \
- --table orders \
- --target-dir /user/cloudera/problem1/orders/ \
- --as-avrodatafile \
- --compress \
- --compression-codec org.apache.hadoop.io.compress.SnappyCodec
- sqoop import \
- --connect 'jdbc:mysql://localhost:3306/retail_db' \
- --username root \
- --password cloudera \
- --table order_items \
- --target-dir /user/cloudera/problem1/order-items \
- --as-avrodatafile \
- --compress \
- --compression-codec org.apache.hadoop.io.compress.SnappyCodec
- sqlContext.setConf("spark.sql.shuffle.partitions","4")
- val orders = sqlContext.read.format("com.databricks.spark.avro").load("/user/cloudera/problem1/orders")
- val order_items = sqlContext.read.format("com.databricks.spark.avro").load("/user/cloudera/problem1/order-items")
- orders.registerTempTable("orders")
- order_items.registerTempTable("order_items")
- // Order_Date d, Order_status a, total_orders a, total_amount d
- val res = sqlContext.sql("""
- select order_date, order_status, count(distinct(o.order_id)) total_orders, sum(oi.order_item_subtotal) total_amount
- from orders o
- join order_items oi on oi.order_item_order_id = o.order_id
- group by o.order_date, o.order_status
- order by o.order_date desc, o.order_status, total_orders, total_amount desc
- """)
- ```
- ```
- +-------------+---------------+------------+------------------+
- | order_date| order_status|total_orders| total_amount|
- +-------------+---------------+------------+------------------+
- |1406185200000| CANCELED| 2|1254.9200382232666|
- |1406185200000| CLOSED| 26|16333.160339355469|
- |1406185200000| COMPLETE| 55| 34552.03063583374|
- |1406185200000| ON_HOLD| 4|1709.7400207519531|
- |1406185200000| PAYMENT_REVIEW| 1|499.95001220703125|
- |1406185200000| PENDING| 22|12729.490217208862|
- |1406185200000|PENDING_PAYMENT| 34|17680.700359344482|
- |1406185200000| PROCESSING| 17| 9964.740190505981|
- |1406185200000|SUSPECTED_FRAUD| 4|2351.6100215911865|
- |1406098800000| CANCELED| 10| 5777.330112457275|
- |1406098800000| CLOSED| 18| 13312.7202835083|
- |1406098800000| COMPLETE| 40|25482.510496139526|
- |1406098800000| ON_HOLD| 6| 4514.460060119629|
- |1406098800000| PAYMENT_REVIEW| 2|1699.8200302124023|
- |1406098800000| PENDING| 11| 6161.3701171875|
- |1406098800000|PENDING_PAYMENT| 30|19279.810424804688|
- |1406098800000| PROCESSING| 15| 7962.790130615234|
- |1406098800000|SUSPECTED_FRAUD| 6|3799.5700721740723|
- |1406012400000| CANCELED| 4| 3209.730094909668|
- |1406012400000| CLOSED| 20| 12688.79024887085|
- +-------------+---------------+------------+------------------+
- ```
- ### Export data from hdfs to mysql
- ```
- scala> res.map(x=> x(0)+","+x(1)+","+x(2)+","+x(3)).saveAsTextFile("/user/cloudera/problem1/resulta-csv")
- mysql> create table result (col1 varchar(30), col2 varchar(30), col3 varchar(30), col4 varchar(30));
- $
- sqoop export \
- --connect 'jdbc:mysql://localhost:3306/retail_db' \
- --username root \
- --password cloudera \
- --table result-a \
- --export-dir /user/cloudera/problem1/resulta-csv \
- --input-fields-terminated-by ',' \
- --input-lines-terminated-by '\n'
- ```
- ### Using dataframe API
- ```
- // Order_Date d, Order_status a, total_orders a, total_amount d
- //using dataframe
- sqlContext.setConf("spark.sql.shuffle.partitions","4")
- val orders = sqlContext.read.parquet("/user/hive/warehouse/retail_db.db/orders/")
- val order_items = sqlContext.read.parquet("/user/hive/warehouse/retail_db.db/order_items/")
- orders.join(order_items).where($"order_id" === $"order_item_order_id").groupBy($"order_date",$"order_status").agg(countDistinct($"order_id").alias("total_orders"), sum($"order_item_subtotal").alias("total_amount")).sort(desc("order_date"),$"order_status",$"total_orders",$"total_amount")
- ```
- ```
- +-------------+---------------+------------+------------------+
- | order_date| order_status|total_orders| total_amount|
- +-------------+---------------+------------+------------------+
- |1406185200000| CANCELED| 2|1254.9200382232666|
- |1406185200000| CLOSED| 26|16333.160339355469|
- |1406185200000| COMPLETE| 55| 34552.03063583374|
- |1406185200000| ON_HOLD| 4|1709.7400207519531|
- |1406185200000| PAYMENT_REVIEW| 1|499.95001220703125|
- |1406185200000| PENDING| 22|12729.490217208862|
- |1406185200000|PENDING_PAYMENT| 34|17680.700359344482|
- |1406185200000| PROCESSING| 17| 9964.740190505981|
- |1406185200000|SUSPECTED_FRAUD| 4|2351.6100215911865|
- |1406098800000| CANCELED| 10| 5777.330112457275|
- |1406098800000| CLOSED| 18| 13312.7202835083|
- |1406098800000| COMPLETE| 40|25482.510496139526|
- |1406098800000| ON_HOLD| 6| 4514.460060119629|
- |1406098800000| PAYMENT_REVIEW| 2|1699.8200302124023|
- |1406098800000| PENDING| 11| 6161.3701171875|
- |1406098800000|PENDING_PAYMENT| 30|19279.810424804688|
- |1406098800000| PROCESSING| 15| 7962.790130615234|
- |1406098800000|SUSPECTED_FRAUD| 6|3799.5700721740723|
- |1406012400000| CANCELED| 4| 3209.730094909668|
- |1406012400000| CLOSED| 20| 12688.79024887085|
- +-------------+---------------+------------+------------------+
- `
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement