Advertisement
franji

Spark Jan 2015

Jan 21st, 2015
2,606
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
SPARK 0.71 KB | None | 0 0
  1. Doc in:
  2. https://docs.google.com/document/d/14KiTyaq0c1jCPqgmyYuyr4lwDP3xpVrf8DXx60AJVjw/edit?usp=sharing
  3. // Average Word Count - “normal” code
  4. val f = sc.textFile("file:///Users/talfranji/Dropbox/research/langmodel.py")
  5. val avglens = f.flatMap(_.split(" ")).filter(_.length > 0).
  6.     map(word => (word(0), word.length)).
  7.         groupByKey.
  8.            map {case (k,v) => (k, v.sum.toFloat /v.size)}
  9.  
  10. // Average Word Count - better performance
  11. val avglens = f.flatMap(_.split(" ")).filter(_.length > 0).
  12.     map(word => (word(0), (word.length, 1))).
  13.         reduceByKey{case ((tot1,count1), (tot2, count2)) => ( (tot1 + tot2)  ,(count1 + count2))}.
  14.             mapValues {case (tot, count) => tot.toFloat/count}
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement