Advertisement
Guest User

Untitled

a guest
Feb 20th, 2017
94
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.59 KB | None | 0 0
  1. import logging, gensim, bz2
  2. logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
  3.  
  4. # load id->word mapping (the dictionary), one of the results of step 2 above
  5. id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt')
  6. # load corpus iterator
  7. mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm')
  8. # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use
  9. #this if you compressed the TFIDF output (recommended)
  10.  
  11. print(mm)
  12.  
  13. 45549 aa 18622
  14. 76459 aaa 9951
  15. 90499 aaaa 953
  16. 90492 aaas 901
  17. 76461 aab 1101
  18. 76460 aac 1817
  19. [...]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement