Advertisement
Guest User

ReproducibleExampleNMcA Corpus

a guest
Mar 23rd, 2013
260
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
R 0.83 KB | None | 0 0
  1. text <- c('saying text is good',
  2.           'saying text once and saying text twice is better',
  3.           'saying text text text is best',
  4.           'saying text once is still ok',
  5.           'not saying it at all is bad',
  6.           'because text is a good thing',
  7.           'we all like text',
  8.           'even though sometimes it is missing')
  9.  
  10. validationText <- c("This has different words in it.",
  11.                      "But I still want to count",
  12.                      "the occurence of text",
  13.                      "for example")
  14.  
  15. TextCorpus <- Corpus(VectorSource(text))
  16. ValiTextCorpus <- Corpus(VectorSource(validationText))
  17.  
  18. Control = list(stopwords=TRUE, removePunctuation=TRUE, removeNumbers=TRUE, MinDocFrequency=5)
  19.  
  20. TextDTM = DocumentTermMatrix(TextCorpus, Control)
  21. ValiTextDTM = DocumentTermMatrix(ValiTextCorpus, Control)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement