SHARE
TWEET

Untitled

a guest Sep 11th, 2019 104 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. val streamingDataFrame = spark.readStream
  2.     .schema(staticSchema)
  3.     .option("maxFilesPerTrigger", 1)
  4.     .format("csv")
  5.     .option("header", "true")
  6.     .load("/data/retail-data/by-day/*.csv")
  7.  
  8. val purchaseByCustomerPerHour = streamingDataFrame
  9.   .selectExpr(
  10.     "CustomerId",
  11.     "(UnitPrice * Quantity) as total_cost",
  12.     "InvoiceDate")
  13.   .groupBy(
  14.     $"CustomerId", window($"InvoiceDate", "1 day"))
  15.   .sum("total_cost")
  16.  
  17. purchaseByCustomerPerHour.writeStream
  18.     .format("memory") // memory = store in-memory table
  19.     .queryName("customer_purchases") // the name of the in-memory table
  20.     .outputMode("complete") // complete = all the counts should be in the table
  21.     .start()
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top