SHARE
TWEET

Untitled

a guest Jun 25th, 2019 64 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. +---+----+-------+
  2. | ID|TYPE|ROW_NUM|
  3. +---+----+-------+
  4. |  A| cat|      1|
  5. |  B| cat|      2|
  6. |  C| cat|      3|
  7. |  D| cat|      4|
  8. |  E| dog|      5|
  9. |  F| cat|      6|
  10. |  G| cat|      7|
  11. |  H| cat|      8|
  12. |  I| cat|      9|
  13. |  J| dog|     10|
  14. +---+----+-------+
  15.      
  16. from pyspark.sql.types import StructType, StructField, IntegerType, DoubleType, StringType
  17. import pyspark.sql.functions as f
  18.  
  19. temp_struct = StructType([
  20.     StructField('ID',  StringType()),
  21.     StructField('TYPE',  StringType()),
  22.     StructField('ROW_NUM',    IntegerType()) # essentially the rank going in.
  23. ])
  24.  
  25.  
  26. temp_df = spark.createDataFrame([
  27.     ['A',  'cat', 1],
  28.     ['B',  'cat', 2],
  29.     ['C',  'cat', 3],
  30.     ['D',  'cat', 4],
  31.     ['E',  'dog', 5],
  32.     ['F',  'cat', 6],
  33.     ['G',  'cat', 7],
  34.     ['H',  'cat', 8],
  35.     ['I',  'cat', 9],
  36.     ['J',  'dog', 10]
  37. ], temp_struct)
  38.  
  39. temp_df.show()
  40.  
  41. the_thing_i_am_looking_to_do = 0.01 # place holder
  42.  
  43. # where the the_thing_i_am_looking_to_do is the number of rows with a row_num <= my adjusted ADJUSTED_RANK.
  44.  
  45. temp_df.withColumn('ADJUSTED_RANK', f.when(f.col('TYPE') == 'dog',
  46.                                            f.col('ROW_NUM') * .2 + the_thing_i_am_looking_to_do)
  47.                    .otherwise(f.col('ROW_NUM'))).show()
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top