Advertisement
holdenk

Untitled

Nov 14th, 2015
205
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Scala 0.38 KB | None | 0 0
  1. val rdd = sc.textFile("python/pyspark/*.py", 20) // Make sure we have many partitions
  2. // Evil group by key version
  3. val words = rdd.flatMap(_.split(" "))
  4. val wordPairs = words.map((_, 1))
  5. val grouped = wordPairs.groupByKey()
  6. val evilWordCounts = grouped.mapValues(_.sum)
  7. evilWordCounts.take(5)
  8. // Less evil version
  9. val wordCounts = wordPairs.reduceByKey(_ + _)
  10. wordCounts.take(5)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement