Advertisement
Guest User

Untitled

a guest
Oct 27th, 2016
73
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.27 KB | None | 0 0
  1. This is a foo bar sentence .
  2. And this is the first txtfile in the corpus .
  3.  
  4. Counter({('i', 's', '</w>'): 2, ('t', 'h', 'e', '</w>'): 2, ('.', '</w>'): 2, ('T', 'h', 'i', 's', '</w>'): 1, ('f', 'i', 'r', 's', 't', '</w>'): 1, ('t', 'x', 't', 'f', 'i', 'l', 'e', '</w>'): 1, ('f', 'o', 'o', '</w>'): 1, ('t', 'h', 'i', 's', '</w>'): 1, ('s', 'e', 'n', 't', 'e', 'n', 'c', 'e', '</w>'): 1, ('A', 'n', 'd', '</w>'): 1, ('b', 'a', 'r', '</w>'): 1, ('c', 'o', 'r', 'p', 'u', 's', '</w>'): 1, ('a', '</w>'): 1, ('i', 'n', '</w>'): 1})
  5.  
  6. $ echo -e """This is a foo bar sentence .nAnd this is the first txtfile in the corpus .""" > test.txt
  7. $ cat test.txt
  8. This is a foo bar sentence .
  9. And this is the first txtfile in the corpus .
  10. $ python
  11. >>> from collections import Counter
  12. >>> open('test.txt').read().split()
  13. ['This', 'is', 'a', 'foo', 'bar', 'sentence', '.', 'And', 'this', 'is', 'the', 'first', 'txtfile', 'in', 'the', 'corpus', '.']
  14. >>> Counter(open('test.txt').read().split())
  15. Counter({'is': 2, '.': 2, 'the': 2, 'a': 1, 'And': 1, 'bar': 1, 'sentence': 1, 'This': 1, 'txtfile': 1, 'this': 1, 'in': 1, 'foo': 1, 'corpus': 1, 'first': 1})
  16. >>> Counter(map(lambda x: tuple(list(x)+['</w>']), open('test.txt').read().split()))
  17. Counter({('i', 's', '</w>'): 2, ('t', 'h', 'e', '</w>'): 2, ('.', '</w>'): 2, ('T', 'h', 'i', 's', '</w>'): 1, ('f', 'i', 'r', 's', 't', '</w>'): 1, ('t', 'x', 't', 'f', 'i', 'l', 'e', '</w>'): 1, ('f', 'o', 'o', '</w>'): 1, ('t', 'h', 'i', 's', '</w>'): 1, ('s', 'e', 'n', 't', 'e', 'n', 'c', 'e', '</w>'): 1, ('A', 'n', 'd', '</w>'): 1, ('b', 'a', 'r', '</w>'): 1, ('c', 'o', 'r', 'p', 'u', 's', '</w>'): 1, ('a', '</w>'): 1, ('i', 'n', '</w>'): 1})
  18.  
  19. >>> x = Counter()
  20. >>> for line in open('test.txt'):
  21. ... for word in line.split():
  22. ... x[word]+=1
  23. ...
  24. >>> x = Counter({tuple(list(k)+['</w>']):v for k,v in x.items()})
  25. >>> x
  26. Counter({('i', 's', '</w>'): 2, ('t', 'h', 'e', '</w>'): 2, ('.', '</w>'): 2, ('T', 'h', 'i', 's', '</w>'): 1, ('t', 'x', 't', 'f', 'i', 'l', 'e', '</w>'): 1, ('f', 'o', 'o', '</w>'): 1, ('t', 'h', 'i', 's', '</w>'): 1, ('s', 'e', 'n', 't', 'e', 'n', 'c', 'e', '</w>'): 1, ('f', 'i', 'r', 's', 't', '</w>'): 1, ('b', 'a', 'r', '</w>'): 1, ('c', 'o', 'r', 'p', 'u', 's', '</w>'): 1, ('a', '</w>'): 1, ('i', 'n', '</w>'): 1, ('A', 'n', 'd', '</w>'): 1})
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement