Advertisement
Guest User

Untitled

a guest
Mar 30th, 2017
62
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.65 KB | None | 0 0
  1. df = spark.sparkContext.parallelize([
  2. ("a", None), ("a", 1), ("a", -1), ("b", 3), ("b", 1)
  3. ]).toDF(["k", "v"])
  4. w = Window().partitionBy("k").orderBy('k','v')
  5.  
  6. df.select(F.col("k"), F.last("v",True).over(w).alias('v')).show()
  7.  
  8. +---+----+
  9. | k| v|
  10. +---+----+
  11. | b| 1|
  12. | b| 3|
  13. | a|null|
  14. | a| -1|
  15. | a| 1|
  16. +---+----+
  17.  
  18. +---+----+
  19. | k| v|
  20. +---+----+
  21. | b| 3|
  22. | b| 3|
  23. | a| 1|
  24. | a| 1|
  25. | a| 1|
  26. +---+----+
  27.  
  28. df.orderBy('k','v').show()
  29. +---+----+
  30. | k| v|
  31. +---+----+
  32. | a|null|
  33. | a| -1|
  34. | a| 1|
  35. | b| 1|
  36. | b| 3|
  37. +---+----+
  38.  
  39. df.orderBy('k','v').groupBy('k').agg(F.first('v')).show()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement