Advertisement
Guest User

Untitled

a guest
Feb 20th, 2019
63
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.75 KB | None | 0 0
  1. symbol= "ῇ̣"
  2. print(len(symbol))
  3. >>>>2
  4.  
  5. # -*- coding: utf-8 -*-
  6. import csv
  7. from alphabet_detector import AlphabetDetector
  8. ad = AlphabetDetector()
  9. with open("tbltext.csv", "r", encoding="utf8") as txt:
  10. data = csv.reader(txt)
  11. for row in data:
  12. text = row[1]
  13. ### Here I have some string manipulation (lowering everything, replacing the predefined set of strings by equal-length '-',...)
  14. ###then I use the ad-module to detect the language by looping over my characters, this is where it goes wrong.
  15. for letter in text:
  16. lang = ad.detect_alphabet(letter)
  17.  
  18. >>>
  19. >>> word = "ἐ̣ν̣τ̣ῇ̣[αὐτ]ῇ"
  20. >>> for letter in word:
  21. ... print(letter)
  22. ...
  23. ̣
  24. ν
  25. ̣
  26. τ
  27. ̣
  28. ̣
  29. [
  30. α
  31. τ
  32. ]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement