Advertisement
Guest User

Untitled

a guest
Dec 12th, 2019
92
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.44 KB | None | 0 0
  1. textFile = spark.read.text('small.log')
  2. textFile.count()
  3. textFile.filter(textFile.value.contains('bob')).count()
  4. textFile.filter(textFile.value.contains('alice')).count()
  5. textFile.filter(textFile.value.contains('alice2')).count()
  6. bob_rows = textFile.filter(textFile.value.contains('bob'))
  7. from pyspark.sql import Row
  8. bob_rows = bob_rows.rdd.flatMap(lambda x: Row(x))
  9. bob_times = bob_rows.rdd.map(lambda row: row.split('\t')[2])
  10. bob_times
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement