Advertisement
Guest User

Untitled

a guest
Jun 18th, 2019
123
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.05 KB | None | 0 0
  1. df.show()
  2. label prediction probability
  3. 0 0 [1,2,[],[0.7558548984793847,0.2441451015206153]]
  4. 0 0 [1,2,[],[0.5190322149055472,0.4809677850944528]]
  5. 0 1 [1,2,[],[0.4884140358521083,0.5115859641478916]]
  6. 0 1 [1,2,[],[0.4884140358521083,0.5115859641478916]]
  7. 1 1 [1,2,[],[0.40305518381637956,0.5969448161836204]]
  8. 1 1 [1,2,[],[0.40570407426458577,0.5942959257354141]]
  9.  
  10. # The probability column is VectorUDT and looks like an array of dim 4 that contains probabilities of predicted variables I want to retrieve
  11. df.schema
  12. StructType(List(StructField(label,DoubleType,true),StructField(prediction,DoubleType,false),StructField(probability,VectorUDT,true)))
  13.  
  14. # I tried this:
  15. import pyspark.sql.functions as f
  16.  
  17. df.withColumn("prob_flag", f.array([f.col("probability")[3][1])).show()
  18.  
  19. "Can't extract value from probability#6225: need struct type but got struct<type:tinyint,size:int,indices:array<int>,values:array<double>>;"
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement