Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # pyspark -- partition by key
- def partition_by_key(x):
- key_lookup = x.keys().distinct().collect()
- key_lookup = dict(zip(key_lookup), range(len(key_lookup)))
- return x.partitionBy(len(key_lookup), partitionFunc=lambda k: key_lookup[k])
Add Comment
Please, Sign In to add comment