Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- //Number of partition is mandatory to set, else it will get a memory exception
- val groupedRDD = sc.parallelize(rdMapped.collect().flatMap{case(k, v) => v.split("\\p{Digit}|\\p{Space}|[\\p{Punct}&&[^']]|(?<![a-zA-Z])'|'(?![a-zA-Z])|\\“")
- .filter(word => word.length > 1)
- .filter(word => !stopWords.contains(word.toLowerCase()))
- .map(x => (k + " " + x.toLowerCase(), 1))},
- 100) //THIS IS WHAT WE NEED TO SET
- val redByKey = groupedRDD.reduceByKey((a, b) => a + b)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement