Advertisement
Dundre32

Untitled

Apr 25th, 2020
115
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.30 KB | None | 0 0
  1. import org.apache.spark.ml.feature.RegexTokenizer
  2. val regexTokenizer = new RegexTokenizer()
  3. .setInputCol("reviewText")
  4. .setOutputCol("words")
  5. .setPattern("\\p{Digit}|\\p{Space}|[\\p{Punct}&&[^']]|(?<![a-zA-Z])'|'(?![a-zA-Z])|\\“")
  6. val tokenized = regexTokenizer.transform(dfNew)
  7.  
  8. tokenized.show()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement