Advertisement
vbout

GROUP BY SPARK

Sep 30th, 2022
1,191
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.84 KB | None | 0 0
  1. from pyspark.sql import SparkSession
  2.  
  3. APP_NAME = "DataFrames"
  4. SPARK_URL = "local[*]"
  5.  
  6. spark = SparkSession.builder.appName(APP_NAME) \
  7.         .config('spark.ui.showConsoleProgress', 'false') \
  8.         .getOrCreate()
  9.  
  10. taxi = spark.read.load('/datasets/pickups_terminal_5.csv',
  11.                        format='csv', header='true', inferSchema='true')
  12.  
  13. taxi = taxi.fillna(0)
  14.  
  15. taxi.registerTempTable("taxi")
  16.  
  17. # среднее количество заказов в день за периоды в 30 минут
  18. print(taxi.groupBy("date").mean().select("date", "avg(pickups)").show())
  19.  
  20. #  дни с самым большим в таблице средним арифметическим количеством заказов
  21. print(taxi.groupBy("date").mean().select("date", "avg(pickups)") \
  22.       .sort("avg(pickups)", ascending=False).show())
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement