Advertisement
Guest User

Untitled

a guest
Dec 4th, 2016
71
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.78 KB | None | 0 0
  1.  
  2. def train(self, to_exclude, corpus: TextCorpus):
  3. table = defaultdict(Counter)
  4. words = NGram(self.STOP_WORD for _ in range(self._ngram_size))
  5.  
  6. for text in corpus:
  7. tagged_text = pos_tag(text)
  8. for tagged_word in text:
  9. tag = tagged_word[1]
  10. word = tagged_word[0]
  11. if tag in to_exclude:
  12. continue
  13. else:
  14. table[tuple(words)][tagged_word] += 1
  15. words.popleft()
  16. words.append(tagged_word)
  17.  
  18. table[tuple(words)][self.STOP_WORD] += 1
  19.  
  20. self._table = table
  21. self._probabilities = {words: ProbabilityTable(counts)
  22. for words, counts in table.items()}
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement