Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- import json
- table = sc.textFile("../messages.json")#.sample(False,0.0001,12345)
- dataset = table.map(json.loads)
- dataset.persist()
- dataset.take(1)
- sub_auth = dataset.map(lambda d: (d['subreddit'],d['author'])).distinct()
- auth_occ = sub_auth.map(lambda r: (r[1],1)).reduceByKey(lambda a,b: a+b)
- auth_occ.sortBy(lambda a: a[1]).take(50)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement