Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- from pyspark.ml.feature import StringIndexer
- indexer = StringIndexer(inputCol='Country', outputCol='Country_ID')
- modified_df = indexer.fit(df).transform(df)
- modified_df.select('UserId').filter(df['Country_ID'] == 2).show()
- modified_df.columns
- ['UserId', 'Country', 'Country_ID']
Add Comment
Please, Sign In to add comment