Guest User

Untitled

a guest
Jan 19th, 2018
89
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.24 KB | None | 0 0
  1. # pyspark -- partition by key
  2.  
  3. def partition_by_key(x):
  4. key_lookup = x.keys().distinct().collect()
  5. key_lookup = dict(zip(key_lookup), range(len(key_lookup)))
  6. return x.partitionBy(len(key_lookup), partitionFunc=lambda k: key_lookup[k])
Add Comment
Please, Sign In to add comment