Advertisement
Guest User

Untitled

a guest
Dec 10th, 2016
58
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.83 KB | None | 0 0
  1. #documents with fewer than 10 word tokens (after cleaning) are discarded
  2. dtm.lee = dtm[rowSums(as.matrix(dtm))>10,]
  3.  
  4. # terms that occur below frequency (100) cut-offs
  5. dtm.lee = dtm.lee[,colSums(as.matrix(dtm))>100]
  6.  
  7. # converts pre-processed document matrices to stm format
  8. corp.final = readCorpus(dtm.lee,type="slam")
  9.  
  10. # removing words and renumbering word indices
  11. corp.prep = prepDocuments(corp.final$documents,corp.final$vocab,corp.final$meta)
  12.  
  13.  
  14. *********************************************************************
  15. ****** This is where I get "Detected missing terms, renumbering" ****
  16.  
  17. ### stm ###
  18. tm_dtm = stm(corp.prep$documents, corp.prep$vocab, K=0, max.em.its = 500,data = corp.prep$meta, init.type = "Spectral", verbose = T,seed=100)
  19.  
  20. theta = tm_dtm$theta
  21. rownames(theta) = rownames(dtm.lee) <<<- this is how I keep my original id
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement