Guest User

Spark

a guest
Nov 3rd, 2018
186
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.19 KB | None | 0 0
  1. >>> from pyspark.sql import functions as F
  2. >>> df1 = spark.read.format("csv").option("inferSchema", "true").option("header","true").load('file:///info_new2.txt')
  3. WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
  4. >>> df1.show()
  5. +---+-------+--------+-------+----+------+-----+
  6. | id| circle|operator| info|prio| info1|prio1|
  7. +---+-------+--------+-------+----+------+-----+
  8. | 1| delhi| airtel| 1234|1.05| 212| 1.8|
  9. | 2|lucknow|vodafone| 23412|1.01| 2321| 1.5|
  10. | 3|gurgaon| idea| 21|1.05|123123| 1.0|
  11. | 4|chennai| airtel| 1232| 1.1| 12| 1.1|
  12. | 2|lucknow|vodafone|3432423| 1.6|123213| 1.1|
  13. | 3| ggn| ideas| 34324| 1.4| 23213| 1.9|
  14. +---+-------+--------+-------+----+------+-----+
  15.  
  16. >>> df3 = df1.groupBy('id').agg(F.min('prio').alias('prio'),F.min('prio1').alias('prio1'),F.min('circle').alias('circle'),F.min('operator').alias('operator')).sort('id')
  17. >>> df3 = df1.groupBy('id').agg(F.min('prio').alias('prio'),F.min('prio1').alias('prio1'),F.min('circle').alias('circle'),F.min('operator').alias('operator')).sort('id')
  18. >>> df3.show()
  19. +---+----+-----+-------+--------+
  20. | id|prio|prio1| circle|operator|
  21. +---+----+-----+-------+--------+
  22. | 1|1.05| 1.8| delhi| airtel|
  23. | 2|1.01| 1.1|lucknow|vodafone|
  24. | 3|1.05| 1.0| ggn| idea|
  25. | 4| 1.1| 1.1|chennai| airtel|
  26. +---+----+-----+-------+--------+
  27.  
  28. >>> df4=df3.join(df1, ["id", "prio"]).select(["id","prio","info"])
  29. >>> df5=df3.join(df1, ["id", "prio1"]).select(["id","prio1","info1"])
  30. >>>
  31. >>> df6=df4.join(df5, ["id"])
  32. >>> df4.show()
  33. +---+----+-----+
  34. | id|prio| info|
  35. +---+----+-----+
  36. | 1|1.05| 1234|
  37. | 2|1.01|23412|
  38. | 3|1.05| 21|
  39. | 4| 1.1| 1232|
  40. +---+----+-----+
  41.  
  42. >>> df5.show()
  43. +---+-----+------+
  44. | id|prio1| info1|
  45. +---+-----+------+
  46. | 1| 1.8| 212|
  47. | 2| 1.1|123213|
  48. | 3| 1.0|123123|
  49. | 4| 1.1| 12|
  50. +---+-----+------+
  51.  
  52. >>> df6.show()
  53. +---+----+-----+-----+------+
  54. | id|prio| info|prio1| info1|
  55. +---+----+-----+-----+------+
  56. | 1|1.05| 1234| 1.8| 212|
  57. | 2|1.01|23412| 1.1|123213|
  58. | 3|1.05| 21| 1.0|123123|
  59. | 4| 1.1| 1232| 1.1| 12|
  60. +---+----+-----+-----+------+
Add Comment
Please, Sign In to add comment