Advertisement
Guest User

Untitled

a guest
Nov 18th, 2019
93
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.33 KB | None | 0 0
  1. import json
  2. table = sc.textFile("../messages.json")#.sample(False,0.0001,12345)
  3. dataset = table.map(json.loads)
  4. dataset.persist()
  5. dataset.take(1)
  6. sub_auth = dataset.map(lambda d: (d['subreddit'],d['author'])).distinct()
  7. auth_occ = sub_auth.map(lambda r: (r[1],1)).reduceByKey(lambda a,b: a+b)
  8. auth_occ.sortBy(lambda a: a[1]).take(50)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement