Guest User

Untitled

a guest
Oct 18th, 2018
83
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.28 KB | None | 0 0
  1. >>> df.show()
  2. +---+-------+-----+-------+
  3. | id| ranges|score| uom|
  4. +---+-------+-----+-------+
  5. | 1| low| 20|percent|
  6. | 1|verylow| 10|percent|
  7. | 1| high| 70| bytes|
  8. | 1| medium| 40|percent|
  9. | 1| high| 60|percent|
  10. | 1|verylow| 10|percent|
  11. | 1| high| 70|percent|
  12. +---+-------+-----+-------+
  13.  
  14. results = spark.sql('select percentile_approx(score,0.95) as score, first(ranges) from subset GROUP BY id')
  15.  
  16. >>> results.show()
  17. +-----+--------------------+
  18. |score|first(ranges, false)|
  19. +-----+--------------------+
  20. | 70| low|
  21. +-----+--------------------+
  22.  
  23. > pyspark.sql.utils.AnalysisException: u"expression 'subset.`ranges`' is
  24. > neither present in the group by, nor is it an aggregate function. Add
  25. > to group by or wrap in first() (or first_value) if you don't care
  26. > which value you get.;;nAggregate [id#0L],
  27. > [percentile_approx(score#2L, cast(0.95 as double), 10000, 0, 0) AS
  28. > score#353L, ranges#1]n+- SubqueryAlias subsetn +- LogicalRDD
  29. > [id#0L, ranges#1, score#2L, uom#3], falsen
  30.  
  31. >>> map = spark.sql('select ranges, score from df')
  32.  
  33. >>> results = spark.sql('select percentile_approx(score,0.95) as score from subset GROUP BY id')
  34.  
  35. >>> final_result = spark.sql('select r.score, m.ranges from results as r join map as m on r.score = m.score')
Add Comment
Please, Sign In to add comment