Advertisement
Guest User

Untitled

a guest
Jul 16th, 2019
60
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.58 KB | None | 0 0
  1. id value
  2. 0 A
  3. 1 A
  4. 2 B
  5. 3 C
  6. 4 A
  7. 5 A
  8. 6 A
  9. 7 B
  10.  
  11. id value
  12. 0 A
  13. 2 B
  14. 3 C
  15. 4 A
  16. 7 B
  17.  
  18. df_with_block = df.withColumn(
  19. "block", (col("id") / df.rdd.getNumPartitions()).cast("int"))
  20.  
  21. window = Window.partitionBy("block").orderBy("id")
  22.  
  23. get_last = when(lag("value", 1).over(window) == col("value"), False).otherwise(True)
  24.  
  25. reduced_df = unificated_with_block.withColumn("reduced",get_last)
  26. .where(col("reduced")).drop("reduced")
  27.  
  28. id value
  29. 0 A
  30. 2 B
  31. 3 C
  32. 4 A
  33. 6 A
  34. 7 B
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement