vegaseat

word frequency count

Mar 13th, 2015
260
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 1.24 KB | None | 0 0
  1. ''' Word_frequency3a.py
  2. experiments with string processing
  3. preprocess the string and do a word frequency count
  4. words with matching frequency are in order too
  5. '''
  6.  
  7. from string import punctuation
  8. from collections import Counter
  9.  
  10. # sample text for testing (could come from a text file)
  11. text = """\
  12. A mouse can slip through a hole the size of a penny.
  13. A giraffe can clean its ears with its tongue.
  14. A giraffe's heart beats 50 times a minute.
  15. A dog's heart beats 100 times a minute.
  16. A hedgehog's heart beats 300 times a minute.
  17. """
  18.  
  19. # since giraffe's would turn into giraffes, optionally remove 's
  20. text2 = text.replace("'s", "")
  21.  
  22. # remove punctuation marks and change to lower case
  23. text3 = ''.join(c for c in text2.lower() if c not in punctuation)
  24.  
  25. # text3.split() splits text3 at white spaces
  26. word_list = text3.split()
  27. # creates a list of (word, frequency) tuples of the 10
  28. # most common words sorted by frequency
  29. wf_tuple_list = Counter(word_list).most_common(10)
  30.  
  31. for w, f in wf_tuple_list:
  32.     # newer string formatting style Python27 and higher
  33.     print("{:3d}  {}".format(f, w))
  34.  
  35. ''' result (10 most common words) ...
  36. 10  a
  37.  3  heart
  38.  3  times
  39.  3  beats
  40.  3  minute
  41.  2  can
  42.  2  its
  43.  2  giraffe
  44.  1  slip
  45.  1  mouse
  46. '''
Advertisement
Add Comment
Please, Sign In to add comment