Guest User

Untitled

a guest
Mar 18th, 2018
97
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.50 KB | None | 0 0
  1. # read documents
  2. FILEDIR <- (path)
  3. txts <- readtext(paste0(FILEDIR, "/", "*.txt"))
  4. my_corpus <- corpus(txts)
  5.  
  6. #start processing
  7. typedPrefix <- my_corpus
  8. typedPrefix <- tokens(gsub("\s", "_", typedPrefix), "character", ngrams=1:3, conc="", remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE)
  9. dfm2 <- dfm(typedPrefix)
  10. tdm2 <- as.TermDocumentMatrix(t(dfm2), weighting=weightTf)
  11. as.matrix(tdm2)
  12.  
  13. #write output file
  14. write.csv2(as.matrix(tdm2), file = "typedPrefix.csv")
Add Comment
Please, Sign In to add comment