Advertisement
Guest User

Untitled

a guest
Oct 23rd, 2019
200
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.44 KB | None | 0 0
  1. How to trim minutes and seconds from date filed in Pyspark datarame.
  2. Different apporaches to do that
  3.  
  4. Input : 2019-01-31 23:16:28
  5. output : 2019-01-31 23:00:00
  6.  
  7. Not effecient
  8.  
  9. df.withColumn('tpep_pickup_datetime', concat(df.tpep_pickup_datetime.substr(0, 13), lit(‘:00:00’)))
  10.  
  11. Effecient then one mentioned above
  12.  
  13. df.withColumn(‘tpep_pickup_datetime',(round(unix_timestamp(col("tpep_pickup_datetime")) / 3600) * 3600)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement