Guest User

Untitled

a guest
Aug 19th, 2018
129
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.54 KB | None | 0 0
  1. Install Spark and run master and slaves (workers) in standalone mode.
  2.  
  3. ```
  4. brew install apache-spark
  5. /usr/local/Cellar/apache-spark/2.3.1/bin/spark-class org.apache.spark.deploy.master.Master
  6. /usr/local/Cellar/apache-spark/2.3.1/bin/spark-class org.apache.spark.deploy.worker.Worker spark://<MASTER_IP>:7077 -c 1 -m 512M
  7. ```
  8.  
  9. In PostgreSQL
  10. ```
  11. CREATE TABLE items(
  12. id VARCHAR (100),
  13. description VARCHAR (100),
  14. );
  15. ```
  16. Load CSV from file
  17. COPY items FROM '/Users/<pathTo>/items.csv' DELIMITER ',' CSV HEADER;
  18.  
  19. Launch Spark Shell
  20. ```
  21. pyspark --conf spark.executor.extraClassPath=/Users/<pathTo>/postgresql-42.2.4.jar --driver-class-path /Users/<pathTo>/postgresql-42.2.4.jar --master spark://192.168.1.199:7077 --executor-memory 512m
  22. ```
  23.  
  24. Connect to existing PostgreSQL
  25.  
  26. ```
  27. df = spark.read \
  28. .format("jdbc") \
  29. .option("driver", "org.postgresql.Driver") \
  30. .option("url", "jdbc:postgresql:retailme") \
  31. .option("dbtable", "items") \
  32. .option("user", "<postgres_user>") \
  33. .option("password", "") \
  34. .load()
  35.  
  36. df.count() #This fires the query and displayed the count once can check progress in the Spark UI
  37.  
  38. # Join, shows number of records where ids exist sin df1 but not df2
  39.  
  40. left_join = df1.join(df2, df1.id == df2.id,how='left') # Could also use 'left_outer'
  41. left_join.filter(col('id').isNull()).count()
  42.  
  43.  
  44. # Write data to tables
  45. mode = "overwrite"
  46. url = "jdbc:postgresql:retailme"
  47. properties = {"user": "<postgreUser>","password": "","driver": "org.postgresql.Driver"}
  48. df.write.jdbc(url=url, table="items", mode=mode, properties=properties)
  49.  
  50. ```
Add Comment
Please, Sign In to add comment