Advertisement
Guest User

Untitled

a guest
Aug 29th, 2015
56
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.32 KB | None | 0 0
  1.  
  2. #reading data from a file
  3. logData = sc.textFile(logFile)
  4.  
  5. X = 10
  6.  
  7. #for each key finding entries that occur more than X times
  8. outliers = logData.map(lambda (k, v): (k, 1)).reduceByKey(lambda a, b: a + b).filter(lambda (k, v): v > X).cache()
  9.  
  10. #filtering these entries out
  11. reducedLogData = logData.subtractByKey(outliers).cache()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement