Advertisement
Guest User

Untitled

a guest
Aug 17th, 2017
78
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.48 KB | None | 0 0
  1. # encoding: utf8
  2.  
  3. """
  4. Created on 2017.08.17
  5.  
  6. @author: yalei
  7. """
  8.  
  9.  
  10. from __future__ import unicode_literals
  11. import pinyin
  12. from Pinyin2Hanzi import DefaultDagParams
  13. from Pinyin2Hanzi import dag
  14.  
  15.  
  16. dagParams = DefaultDagParams()
  17.  
  18.  
  19. def fulanhua(string, verbose=True):
  20. s = pinyin.get(string, format="strip", delimiter=" ")
  21. words = s.split()
  22. res = []
  23. rules = {'ong': 'eng'}
  24. for i, word in enumerate(words):
  25. new_word = word.encode('utf-8')
  26. if new_word.startswith('hu'):
  27. new_word = 'f' + new_word[2:]
  28. elif new_word.startswith('h'):
  29. new_word = 'f' + new_word[1:]
  30. elif new_word.startswith('n'):
  31. new_word = 'l' + new_word[1:]
  32. for i, j in rules.items():
  33. new_word = new_word.replace(i, j)
  34. res.append(new_word)
  35. if verbose:
  36. print('%s -> %s' % (word, res[i]))
  37. result = dag(dagParams, res, path_num=10, log=False)
  38. return ' '.join(res), result
  39.  
  40.  
  41. if __name__ == '__main__':
  42. import sys
  43. string = sys.argv[1]
  44. py, hz = fulanhua(string, verbose=False)
  45. for item in hz:
  46. score = item.score
  47. res = ''.join(item.path)
  48. print(' %s %s' % (score, res))
  49. if not hz:
  50. print(py)
  51.  
  52. """
  53. >>> python fulanhua.py 你能不能别说你是湖南人
  54. 3.27469370593e-05 理冷不冷别说历史辅懒人
  55.  
  56. >>> python fulanhua.py 你能不能别说普通话
  57. 0.000149049266441 理冷不冷别说扑腾发
  58.  
  59. >>> python fulanhua.py 黄花机场
  60. 0.0794434315813 方法机场
  61. """
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement