Guest User

Untitled

a guest
Jan 18th, 2018
112
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.57 KB | None | 0 0
  1. from pyspark.streaming import StreamingContext
  2. ssc = StreamingContext(sc, 10)
  3. lines = ssc.socketTextStream("gw01.itversity.com", 19999)
  4. departmentData = lines.filter(lambda s: s.split()[6].split("/")[1] == "department")
  5. departmentTuples = departmentData.map(lambda s: (s.split()[6].split("/")[2], 1))
  6. countByDepartment = departmentTuples.reduceByKeyAndWindow(lambda x, y: x + y, 30, 10)
  7. #countByDepartment = departmentTuples.reduceByKey(lambda x, y: x + y)
  8. #countByDepartment.pprint()
  9. countByDepartment.saveAsTextFiles("/user/dgadiraju/streaming_count_by_department")
  10. ssc.start()
Add Comment
Please, Sign In to add comment