Guest User

Untitled

a guest
May 23rd, 2018
85
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.44 KB | None | 0 0
  1. def get_parts_of_speech(prod_desc):
  2.  
  3. tokenizer = RegexpTokenizer(r'\w+')
  4. tokens = tokenizer.tokenize(prod_desc)
  5. text = nltk.Text(tokens)
  6. tagged = nltk.pos_tag(text, tagset='universal')
  7.  
  8. counts = Counter(tag for word,tag in tagged)
  9. for word, tag in tagged:
  10. counts[tag] += 1
  11. total = sum(counts.values())
  12. counts_norm = dict((word, float(count)/total) for word,count in counts.items())
  13.  
  14. return counts_norm
Add Comment
Please, Sign In to add comment