Advertisement
Guest User

Untitled

a guest
Jul 2nd, 2015
268
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.68 KB | None | 0 0
  1. wordsList = ['cat', 'elephant', 'rat', 'rat', 'cat']
  2. wordsRDD = sc.parallelize(wordsList, 4)
  3.  
  4.  
  5. wordCounts = wordPairs.reduceByKey(lambda x,y:x+y)
  6. print wordCounts.collect()
  7.  
  8. #PRINTS--> [('rat', 2), ('elephant', 1), ('cat', 2)]
  9.  
  10. from operator import add
  11. totalCount = (wordCounts
  12. .map(<< FILL IN >>)
  13. .reduce(<< FILL IN >>))
  14.  
  15. #SHOULD PRINT 5
  16.  
  17. #(wordCounts.values().sum()) // does the trick but I want to this with map() and reduce()
  18.  
  19.  
  20. I need to use a reduce() action to sum the counts in wordCounts and then divide by the number of unique words.
  21.  
  22. .map(lambda x:x.values())
  23. .reduce(lambda x:sum(x)))
  24.  
  25. AND,
  26.  
  27. .map(lambda d:d[k] for k in d)
  28. .reduce(lambda x:sum(x)))
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement