Advertisement
Guest User

Untitled

a guest
Apr 19th, 2019
94
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.76 KB | None | 0 0
  1. from pyspark.sql.functions import udf
  2.  
  3. month_map = {
  4. 'Jan': 1, 'Feb': 2, 'Mar':3, 'Apr':4, 'May':5, 'Jun':6, 'Jul':7,
  5. 'Aug':8, 'Sep': 9, 'Oct':10, 'Nov': 11, 'Dec': 12
  6. }
  7.  
  8. def parse_clf_time(text):
  9. """ Convert Common Log time format into a Python datetime object
  10. Args:
  11. text (str): date and time in Apache time format [dd/mmm/yyyy:hh:mm:ss (+/-)zzzz]
  12. Returns:
  13. a string suitable for passing to CAST('timestamp')
  14. """
  15. # NOTE: We're ignoring the time zones here, might need to be handled depending on the problem you are solving
  16. return "{0:04d}-{1:02d}-{2:02d} {3:02d}:{4:02d}:{5:02d}".format(
  17. int(text[7:11]),
  18. month_map[text[3:6]],
  19. int(text[0:2]),
  20. int(text[12:14]),
  21. int(text[15:17]),
  22. int(text[18:20])
  23. )
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement