SHARE
TWEET

Untitled

a guest Feb 2nd, 2011 340 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. from Levenshtein import distance
  2.  
  3. WORD = 'causes'
  4.  
  5. word_list = map(str.strip, file('word.list').readlines())
  6.  
  7. # Build trie forest
  8. tries = {}
  9. for word in word_list:
  10.     node = tries.setdefault(distance(word, WORD), {'size':0})
  11.     node['size'] += 1
  12.     for l in word + '$':
  13.         node = node.setdefault(l, {})
  14.        
  15. print 'Trie sizes:'
  16. for i, trie in tries.items():
  17.     print '%s: %s' % (i, trie.pop('size'))
  18.  
  19. Trie sizes:
  20. 0: 1
  21. 1: 18
  22. 2: 232
  23. 3: 2262
  24. 4: 12622
  25. 5: 34862
  26. 6: 45920
  27. 7: 44139
  28. 8: 39836
  29. 9: 30594
  30. 10: 21427
  31. 11: 14021
  32. 12: 8416
  33. 13: 4840
  34. 14: 2459
  35. 15: 1293
  36. 16: 596
  37. 17: 301
  38. 18: 114
  39. 19: 59
  40. 20: 21
  41. 21: 11
  42. 22: 6
  43. 23: 1
  44. 24: 1
  45. 25: 4
  46. 26: 2
  47. 29: 2
  48. 41: 1
RAW Paste Data
Top