Advertisement
Guest User

Derp_NLTK_Derp

a guest
Feb 9th, 2016
54
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.89 KB | None | 0 0
  1. import nltk, csv, time
  2. import nltk.data, nltk.tag
  3.  
  4. from nltk.tag.perceptron import PerceptronTagger
  5. from nltk import word_tokenize
  6. tagger = PerceptronTagger()
  7. tagset = None
  8. def pos_tag2 (tokens):
  9.     tagged_tokens = tagger.tag(tokens)
  10.  
  11.     return tagged_tokens
  12.    
  13. models = {'select': 'VB'}
  14. tagger2 = nltk.tag.UnigramTagger(model=models,backoff=nltk.DefaultTagger('derp'))
  15.  
  16. #tagger2 = nltk.tag.UnigramTagger(model=models,backoff = pos_tag2)
  17. #I want it to do what's in the line above.
  18. #I need it to first identify all of the words, just select for now, in the models list
  19. #Then if it doesn't ID a word via models, to check via Perceptron
  20.  
  21. p = ['I', 'just', 'drank', 'some', 'select', 'coffee', '.']
  22.  
  23. q = tagger2.tag(p)
  24.  
  25. print q
  26.  
  27. # Right now it outputs:
  28. # [('I', 'derp'), ('just', 'derp'), ('drank', 'derp'), ('some', 'derp'), ('select', 'VB'), ('coffee', 'derp'), ('.', 'derp')]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement