Advertisement
Guest User

Untitled

a guest
Oct 14th, 2019
80
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.67 KB | None | 0 0
  1.  
  2. corpus <- Corpus(VectorSource(data$text[1:3030]))
  3. corpus <- tm_map(corpus, content_transformer(removePunctuation)) #quitar puntuacion
  4. corpus <- tm_map(corpus, content_transformer(removeWords), stopwords("english")) #quitar stop words
  5. corpus <- tm_map(corpus, removeWords, c("NUMBER","CITATION","FORMULA")) #quitar estas palabras en especifico
  6. corpus <- tm_map(corpus, content_transformer(tolower)) #aplicamos minusculas
  7. corpus <- tm_map(corpus, stemDocument) #dejamos la raiz de las palabras
  8. corpus <- tm_map(corpus, stripWhitespace) #quitamos los espacios
  9. corpus <- tm_map(corpus, content_transformer(removeNumbers)) #quitamos numeros
  10. # corpus <- tm_map(corpus, PlainTextDocument)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement