Advertisement
Guest User

Untitled

a guest
May 31st, 2016
56
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.49 KB | None | 0 0
  1. background_corpus = TextCorpus('wiki.en.text')
  2.  
  3. adding document #820000 to Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'soestdijk', u'billycorgan', u'olmsville']...)
  4.  
  5. discarding 31072 tokens: [(u'vnsas', 1), (u'ezequeel', 1), (u'trapeztafel', 1), (u'pubsub', 1), (u'gyvenimas', 1), (u'gilibrand', 1), (u'catfaced', 1), (u'beuningan', 1), (u'moodadi', 1), (u'nocaster', 1)]...
  6.  
  7. keeping 2000000 tokens which were in no less than 0 and no more than 830000 (=100.0%) documents
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement