Advertisement
Guest User

Untitled

a guest
Jun 17th, 2019
95
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.75 KB | None | 0 0
  1. >>> from itertools import chain
  2. >>> import pandas as pd
  3. >>> from nltk import word_tokenize
  4. >>> from nltk import FreqDist
  5.  
  6. >>> df = pd.read_csv('x')
  7. >>> df['Description']
  8. 0 Here is a sentence.
  9. 1 This is a foo bar sentence.
  10. Name: Description, dtype: object
  11.  
  12. >>> df['Description'].map(word_tokenize)
  13. 0 [Here, is, a, sentence, .]
  14. 1 [This, is, a, foo, bar, sentence, .]
  15. Name: Description, dtype: object
  16.  
  17. >>> sents = df['Description'].map(word_tokenize).tolist()
  18.  
  19. >>> FreqDist(list(chain(*[everygrams(sent, 1, 3) for sent in sents])))
  20. FreqDist({('sentence',): 2, ('is', 'a'): 2, ('sentence', '.'): 2, ('is',): 2, ('.',): 2, ('a',): 2, ('Here', 'is', 'a'): 1, ('a', 'foo'): 1, ('a', 'sentence'): 1, ('bar', 'sentence', '.'): 1, ...})
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement