Advertisement
Dundre32

Untitled

Apr 29th, 2020
889
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Scala 0.66 KB | None | 0 0
  1. //Number of partition is mandatory to set, else it will get a memory exception
  2.  
  3. val groupedRDD = sc.parallelize(rdMapped.collect().flatMap{case(k, v) => v.split("\\p{Digit}|\\p{Space}|[\\p{Punct}&&[^']]|(?<![a-zA-Z])'|'(?![a-zA-Z])|\\“")
  4.                                                 .filter(word => word.length > 1)
  5.                                                 .filter(word => !stopWords.contains(word.toLowerCase()))                                            
  6.                                                 .map(x => (k + " " + x.toLowerCase(), 1))},
  7.                                                 100) //THIS IS WHAT WE NEED TO SET
  8.  
  9.  
  10. val redByKey = groupedRDD.reduceByKey((a, b) => a + b)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement