Guest User

Untitled

a guest
Nov 15th, 2018
93
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.46 KB | None | 0 0
  1. from pyspark.ml.feature import StringIndexer
  2. from pyspark.ml.feature import OneHotEncoder
  3.  
  4. # ...
  5.  
  6. def one_hot_encode(_df, input_column, output_column):
  7. indexer = StringIndexer(inputCol=input_column, outputCol=input_column+"_indexed", handleInvalid='skip')
  8. _model = indexer.fit(_df)
  9. _td = _model.transform(_df)
  10. encoder = OneHotEncoder(inputCol=input_column+"_indexed", outputCol=output_column, dropLast=True)
  11. _df2 = encoder.transform(_td)
  12. return _df2
Add Comment
Please, Sign In to add comment