Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- #reading data from a file
- logData = sc.textFile(logFile)
- X = 10
- #for each key finding entries that occur more than X times
- outliers = logData.map(lambda (k, v): (k, 1)).reduceByKey(lambda a, b: a + b).filter(lambda (k, v): v > X).cache()
- #filtering these entries out
- reducedLogData = logData.subtractByKey(outliers).cache()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement