Guest User


a guest
Apr 21st, 2018
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.76 KB | None | 0 0
  1. <spectie> no obvious POS, but i suppose you can tell in some cases
  2. <spectie> from the verb morphology in kk and in english
  3. <nathan0n5ire> the words that end in - are verbs
  4. <spectie> and sometimes it has POS
  5. <spectie> АЙЛЫ /adj./ moonlight, moonlit.
  6. <nathan0n5ire> yes adjectives are marked
  7. <nathan0n5ire> I don't think the other ones are though
  8. <spectie> it seems like kazakh stems are marked by being only in uppercase
  9. <spectie> and also in cyrillic (obv)
  10. <spectie> so something could be done with that
  11. <spectie> to extract just the stems
  12. <nathan0n5ire> all of the kazakh words are in uppercase
  13. <spectie> АЙЛЫҚ monthly; monthly wages; ... MEP-31M period of one month.
  14. <spectie>
  15. <spectie> ah
  16. <nathan0n5ire> sometimes the ocr also messed up
  17. <nathan0n5ire> like AJIMAJIA- to embrace.
  18. <spectie> MER-31M is ocr error
  19. <spectie> i would approach this in passes
  20. <spectie> i would start by extracting the really easy stuff
  21. <spectie> where you just have two words:
  22. <spectie> АЙТУ pronunciation.
  23. <spectie> АЙМАҚТЫҚ regional.
  24. <spectie> АЙНАЛАДА around.
  25. <spectie>
  26. <spectie> etc.
  27. <spectie> i would put these in a separate file
  28. <spectie> then i would extract the ones with only comma + full stop as punctuation
  29. <spectie> АЙЛАКЕР sly, cunning one.
  30. <spectie> АЙЛАКЕРЛІК slyness, cunning.
  31. <spectie> АЙЛАЛЫ adroit, resourceful.
  32. <spectie> АЙЛАСЫЗ artless, unsophisticated.
  33. <spectie> etc.
  34. <nathan0n5ire> around 8K start with a kazakh character
  35. <nathan0n5ire> *8K lines
  36. <spectie> nice
  37. <spectie> lines starting with a kazakh character ?
  38. <nathan0n5ire> [АаӘәБбВвГгҒғДдЕеЁёЖжЗзИиЙйКкҚқЛлМмНнҢңОоӨөПпРрСсТтУуҰұҮүФфХхҺһЦцЧчШшЩщЪъЫыІіЬьЭэЮюЯя]
Add Comment
Please, Sign In to add comment