Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- symbol= "ῇ̣"
- print(len(symbol))
- >>>>2
- # -*- coding: utf-8 -*-
- import csv
- from alphabet_detector import AlphabetDetector
- ad = AlphabetDetector()
- with open("tbltext.csv", "r", encoding="utf8") as txt:
- data = csv.reader(txt)
- for row in data:
- text = row[1]
- ### Here I have some string manipulation (lowering everything, replacing the predefined set of strings by equal-length '-',...)
- ###then I use the ad-module to detect the language by looping over my characters, this is where it goes wrong.
- for letter in text:
- lang = ad.detect_alphabet(letter)
- >>>
- >>> word = "ἐ̣ν̣τ̣ῇ̣[αὐτ]ῇ"
- >>> for letter in word:
- ... print(letter)
- ...
- ἐ
- ̣
- ν
- ̣
- τ
- ̣
- ῇ
- ̣
- [
- α
- ὐ
- τ
- ]
- ῇ
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement