Guest User

Untitled

a guest
Dec 17th, 2017
114
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.77 KB | None | 0 0
  1. Exception: Java gateway process exited before sending the driver its port number
  2.  
  3. from pyspark import SparkContext
  4. from pyspark.sql import SQLContext
  5. import pandas as pd
  6.  
  7. sc = SparkContext('local','example') # if using locally
  8. sql_sc = SQLContext(sc)
  9.  
  10. Spark_Full = sc.emptyRDD()
  11. chunk_100k = pd.read_csv("contour-export-2017-12-14.csv", chunksize=100000)
  12. # if you have headers in your csv file:
  13. headers = list(pd.read_csv("contour-export-2017-12-14.csv", nrows=0).columns)
  14.  
  15. for chunky in chunk_100k:
  16. Spark_Full += sc.parallelize(chunky.values.tolist())
  17.  
  18. YourSparkDataFrame = Spark_Full.toDF(headers)
  19. # if you do not have headers, leave empty instead:
  20. # YourSparkDataFrame = Spark_Full.toDF()
  21. YourSparkDataFrame.show()
Add Comment
Please, Sign In to add comment