a guest Oct 23rd, 2019 81 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
- How to trim minutes and seconds from date filed in Pyspark datarame.
- Different apporaches to do that
- Input : 2019-01-31 23:16:28
- output : 2019-01-31 23:00:00
- Not effecient
- df.withColumn('tpep_pickup_datetime', concat(df.tpep_pickup_datetime.substr(0, 13), lit(‘:00:00’)))
- Effecient then one mentioned above
- df.withColumn(‘tpep_pickup_datetime',(round(unix_timestamp(col("tpep_pickup_datetime")) / 3600) * 3600)
RAW Paste Data