Advertisement
Guest User

Untitled

a guest
Jul 22nd, 2019
69
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.27 KB | None | 0 0
  1. with open(myfile, "a") as csv_file:
  2. writer = csv.writer(csv_file, delimiter='t')
  3. writer.writerow(["vertex" + "t" + "id_source" + "t" + "id_target" + "t"+ "similarity"])
  4.  
  5. for part_id in range(joinDesrdd_df.rdd.getNumPartitions()):
  6. part_rdd = joinDesrdd_df.rdd.mapPartitionsWithIndex(make_part_filter(part_id), True)
  7. data_from_part_rdd = part_rdd.collect()
  8. vertex_list = set()
  9.  
  10. for row in data_from_part_rdd:
  11. writer.writerow([....])
  12.  
  13. csv_file.flush()
  14.  
  15. in the workers log:
  16. 19/07/22 08:58:57 INFO Worker: Executor app-20190722085320-0000/2 finished with state KILLED exitStatus 143
  17. 14: 19/07/22 08:58:57 INFO ExternalShuffleBlockResolver: Application app-20190722085320-0000 removed, cleanupLocalDirs = true
  18. 14: 19/07/22 08:58:57 INFO Worker: Cleaning up local directories for application app-20190722085320-0000
  19. 5: 19/07/22 08:58:57 INFO Worker: Executor app-20190722085320-0000/1 finished with state KILLED exitStatus 143
  20. 7: 19/07/22 08:58:57 INFO Worker: Executor app-20190722085320-0000/14 finished with state KILLED exitStatus 143
  21. ...
  22.  
  23. Traceback (most recent call last):
  24. File "/project/6008168/tamouze/RWLastVersion2207/module1.py", line 306, in <module>
  25. for part_id in range(joinDesrdd_df.rdd.getNumPartitions()):
  26. File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 88, in rdd
  27. File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
  28. File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  29. File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
  30. py4j.protocol.Py4JJavaError: An error occurred while calling o528.javaToPython.
  31. : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
  32. Exchange hashpartitioning(id_source#263, id_target#292, similarity#258, 1024)
  33. +- *(11) HashAggregate(keys=[id_source#263, id_target#292, similarity#258], functions=[], output=[id_source#263, id_target#292, similarity#258])
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement