Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ## Made in Python
- Used the following modules:
- * nltk
- * flask
- NLTK
- nltk.tag.tnt
- TnT - Statistical POS tagger
- TnT uses a second order Markov model to produce tags for a sequence of input
- The set of possible tags for a given word is derived from the training data.
- (Training Data: It is the set of all tags that exact word has been assigned.)
- TnT DOES NOT AUTOMATICALLY DEAL WITH UNSEEN WORDS
- TnT SHOULD BE USED WITH SENTENCE-DELIMITED INPUT
- Input for tag function is a single sentence Input for tagdata function is a list of sentences . Output is of a similar form ('WORD' , 'TAG')
- WORKING
- The set of possible tags for a given word is derived from the training data. It is the set of all tags that exact word has been assigned.
- The probability of a tag for a given word is the linear interpolation of 3 markov models; a zero-order, first-order, and a second order model.
- Functions used
- * word_tokenize(s)
- Tokenize a string to split off punctuation other than periods
- * train(data)
- Uses a set of tagged data to train the tagger.
- * tag(data)
- Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple ('WORD' , 'TAG')
- * evaluate(gold)
- Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement