Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- id value
- 0 A
- 1 A
- 2 B
- 3 C
- 4 A
- 5 A
- 6 A
- 7 B
- id value
- 0 A
- 2 B
- 3 C
- 4 A
- 7 B
- df_with_block = df.withColumn(
- "block", (col("id") / df.rdd.getNumPartitions()).cast("int"))
- window = Window.partitionBy("block").orderBy("id")
- get_last = when(lag("value", 1).over(window) == col("value"), False).otherwise(True)
- reduced_df = unificated_with_block.withColumn("reduced",get_last)
- .where(col("reduced")).drop("reduced")
- id value
- 0 A
- 2 B
- 3 C
- 4 A
- 6 A
- 7 B
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement