Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- val rdd = sc.textFile("python/pyspark/*.py", 20) // Make sure we have many partitions
- // Evil group by key version
- val words = rdd.flatMap(_.split(" "))
- val wordPairs = words.map((_, 1))
- val grouped = wordPairs.groupByKey()
- val evilWordCounts = grouped.mapValues(_.sum)
- evilWordCounts.take(5)
- // Less evil version
- val wordCounts = wordPairs.reduceByKey(_ + _)
- wordCounts.take(5)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement