Advertisement
Zoc

Natural Language Toolkit example

Zoc
Apr 12th, 2011
323
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.93 KB | None | 0 0
  1. import nltk;
  2. from urllib import urlopen;
  3. url = "http://news.bbc.co.uk/2/hi/health/2284783.stm";
  4. html = urlopen(url).read();
  5. raw = nltk.clean_html(html);
  6.  
  7. text = nltk.word_tokenize(raw);
  8.  
  9. # f = open ("C:\\folder\\file.txt");
  10. # f.readline();
  11.  
  12. from nltk.corpus import brown;
  13. #from nltk.corpus import mac_morpho; #PT-BR
  14. train_corpus = brown.tagged_sents( categories = "news" );
  15. tags = [ tag for ( word, tag ) in  brown.tagged_words( categories = "news" )];
  16. nltk.FreqDist(tags).max(); # example of how it works. Shows the most frequent words
  17.  
  18. default_tagger = nltk.DefaultTagger('NN'); # Example
  19. default_tagger.tag(text); # Example, shows how tagging works. This one will classify everything as noun ('NN')
  20.  
  21. t0 = nltk.DefaultTagger('NN');
  22. t1 = nltk.UnigramTagger(train_corpus, backoff = t0); # Use the trained "train_corpus" to classify, and classify as specified by backoff if there's no match for this training corpus.
  23. t1.tag(text);
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement