Advertisement
Guest User

Untitled

a guest
Oct 22nd, 2019
126
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 14.54 KB | None | 0 0
  1. The Rice Transliteration Standard for Roman Transliteration of Telugu
  2.  
  3. Roman transliteration of Telugu simply means writing Telugu using
  4. English (Roman) alphabet. Modern Telugu text has Telugu words, English
  5. words written in Roman or Telugu script, and modern punctuation marks.
  6. Transliteration is merely a way to represent modern Telugu text using
  7. English alphabet. Transliteration is not a software; it is a form of
  8. information representation. Transliteration is (typically) done by
  9. humans, resulting in a file written with English alphabets and
  10. punctuation marks.
  11. Inverse transliteration is an operation that extracts Telugu and
  12. English from this file. Inverse transliteration function can be
  13. realized as a software. It is this software we refer to in this
  14. document. The output of this software is a file, which when printed
  15. contains Telugu written in Telugu script and English written either in
  16. Roman script or Telugu script, and is an approximation to the text
  17. that was transliterated in the first place. It is this output we refer
  18. to here.
  19.  
  20. There can be (and are) several such transliteration schemes. We are
  21. proposing the following scheme as a standard. In this scheme, many
  22. letters can be transliterated in more than one way. Some of them are
  23. designed to cater to varying intuition, some to increase speed, and
  24. some to be fault-tolerant. For the sake of a later reference, the
  25. preferred form of the transliteration of the Telugu alphabet is
  26. presented first. We emphasize that one doesn't need to stick to Table
  27. 1, and it is only a part of the standard.
  28.  
  29. Table 1:
  30. -------
  31. vowels: a aa i ee u oo R Ru e ea ai o oe ou
  32.  
  33.  
  34. plosives
  35. and nasals:
  36. k kh g gh ~m
  37.  
  38. c C j jh ~n
  39.  
  40. T Th D Dh N
  41.  
  42. t th d dh n
  43.  
  44. p f b bh m
  45.  
  46. fluids:
  47.  
  48. y r l v S sh s h L x ~r
  49.  
  50.  
  51. where S is "melika sa" and ~r is "banDi ra".
  52.  
  53.  
  54. Examples:
  55.  
  56. English meaning Transliteration
  57.  
  58. uncle maama
  59. ant cheema
  60. monkey koeti
  61. play aaTa
  62. old paata
  63. important mukhyam
  64. saw (n) rampam
  65. eggplant vankaaya
  66. order aaj~na
  67.  
  68. Software takes care of "guDintaalu" (consonant-vowel combinations) and
  69. "vattulu" (consonant-consonant combinations) automatically. This is
  70. not the only way to transliterate these words, though. There are
  71. several other ways. Many letters have alternatives (equivalents), as
  72. in the following table.
  73.  
  74. Table RTS:
  75. _____________________________________________________________________
  76.  
  77.  
  78. a aa=aaa=a' i ee=ii=ia=i' u oo=uu=U=ua=u'
  79.  
  80. R Ru e ea=ae=E=e' ai o oe=O=oa=o' au=ou
  81.  
  82. k kh=K=Kh g gh=G=Gh ~m
  83.  
  84. c=ch C=Ch j jh=J=Jh ~n
  85.  
  86. T=t' Th=th' D=d' Dh=dh' N=nh
  87.  
  88. t th d dh n
  89.  
  90. p f=P=ph=Ph b bh=B=Bh m
  91.  
  92. y r l v=w S sh s h L=lh=Lh x=ksh ~r
  93.  
  94.  
  95. Throughout, h = H.
  96. alu (archaic) = ~l
  97. aloo (archaic) = ~L
  98. arasunna = @M
  99. visarga = @h
  100. avagraha (used in Sanskrit) = @2
  101. na pollu (arachaic) = @n
  102. null operation = _ (underscore) (see below)
  103.  
  104. Syllable break = ^ (see below)
  105. Force combination = & (see below)
  106. For "sunnaa", see below.
  107.  
  108. tcha (allophone of c, now extinct) = ~c
  109. tja (allophone of j, now extnict) = ~j
  110.  
  111. ________________________________________________________________________
  112.  
  113.  
  114. Example. Telugu word meaning monkey can be transliterated as any of
  115. the following: koati, koeti, kOti, ko'ti. The same information is
  116. represented by all of them. Any of these can be chosen, based on
  117. personal preference or convenience.
  118.  
  119.  
  120. Notes.
  121.  
  122. 1. The following symbols are treated as both Telugu and English symbols:
  123. , < . > / ? : * ; + ] } [ { ` " ! $ % ( ) - = 1 2 3 4 5 6 7 8 9 0.
  124. These symbols are transliteration-invariant. That is, these symbols
  125. retain their meaning:
  126.  
  127. mana de'Saaniki "svaatantryam" 1947 lo' vaccindi. kaanii idi
  128. nijangaa svaatantryamaa?
  129.  
  130.  
  131. 2. The following are special characters: ~ @ & ' _ ^ #
  132.  
  133. They have special meanings, as can be noted from Table RTS. (However,
  134. there is a way to print them in the output, as explained later.)
  135.  
  136. 3. Both ' and "a" serve as a vowel-elongation suffix. That is,
  137. "short vowel followed by ' or "a" becomes a long vowel."
  138.  
  139. ceema = ciima = ci'ma = ciama, pOru = poeru = po'ru = poaru
  140.  
  141. 4. There is a retroflex suffix, namely '. That is, "dental plosive
  142. followed by ' becomes a retroflex."
  143.  
  144. aaTa = aat'a, enDa = end'a
  145.  
  146.  
  147. 5. "sunnaa" Generation (Nasal Contraction):
  148. ------------------------------------------
  149.  
  150. All nasals are contracted before plosives as in Rule 1 below. Rule 2,
  151. like Rule 1, improves typo-tolerance.
  152.  
  153. Rule 1. Whenever the letter n or m is followed by one of {k, K, g,
  154. G, c, C, j, J, T, Th, D, Dh, t, th, d, dh, p, P, b, B} (or their
  155. alternatives), it will be converted to sunnaa.
  156.  
  157. Rule 2. Also, whenever the letter m is followed by one of {l, v, s, S},
  158. it will be converted to "sunnaa" automatically.
  159.  
  160. Example: vankaaya, vamkaaya, lankhaNam, lamkhaNam, anga, amga, kance,
  161. kamce, manTa, mamTa, SunTha, SumTha, enDa, emDa, santa, samta,
  162. panthaa, pamthaa, undi, umdi, kampa, kanpa, cembu, cenbu, kaalamloe,
  163. samvatsaram, hamsa, amSa - all generate a "sunnaa" automatically.
  164.  
  165. Force combination:
  166. -----------------
  167. The "sunnaa" generation rules produce unwanted results in rare cases.
  168. The Sanskrit word for acid "aamla" doesn't have a "sunnaa" in it -
  169. we need to force "la-vattu" under ma. Similarly, "kaanpu, "paanpu"
  170. don't have a "sunnaa" in them: we need to force "pa-vattu" under na.
  171. This is done by using "&", as in "aam&la", "kaan&pu", "paan&pu". We
  172. emphasize that & is used only rarely, in special cases such as above.
  173.  
  174. Syllable break:
  175. --------------
  176. Suppose we want to write "wrong number" in Telugu script as one word.
  177. If we write "raangnembar", there will be a "na-vattu" under "ge". But
  178. writing "raang^nembar" breaks the syllable after "raang" and writes
  179. "nembar" next to it, without producing the (unwanted)
  180. consonant-consonant combination. That is, k^ is the "praaNa" (pure)
  181. form of ka (without any vowel added to it). [In particular, typing ^
  182. after m generates a "sunnaa".] However, a word ending in a consonant
  183. always assumes ^ at the end by default. That is, we write "shaap" (for
  184. shop), "lak" (for luck) and not "shaap^", "lak^".
  185.  
  186. Null-operation:
  187. --------------
  188.  
  189. "poruguvaad'iki toeDupad'avoeyi" is perhaps too tough on the eye. For
  190. human readability, it maybe typed as "porugu_vaad'iki
  191. toeDu_paDa_voeyi". Both represent the same information, including
  192. white spaces. The symbol _ is invisible to the software, that is why
  193. we call it a null-op. (However, _ serves another purpose, as will be
  194. explained later.) We recommend using null-op only when the
  195. transliterated text is supposed to be processed by humans. Otherwise,
  196. typing effort is wasted by breaking the words by null-op, since it
  197. is transparent to the software.
  198.  
  199. More equivalents:
  200. ----------------
  201.  
  202. j~n = jn
  203. d'd' = dd'
  204. t't' = tt'
  205.  
  206. How to represent English words:
  207. ------------------------------
  208.  
  209. Consider
  210.  
  211. naa flight delay ayindi
  212.  
  213. in which it is obvious that the second and third words are English.
  214. So, normally there is no need to take any special action when using
  215. English words (which are to be printed in Roman script). Software
  216. should normally be able to handle such a representation. You can skip
  217. the next section which may be read when you run into an unusual
  218. problem.
  219.  
  220. Automatic determination of English words:
  221. ----------------------------------------
  222.  
  223. Since Rice Transliteration Standard as defined in Table RTS is almost
  224. orthogonal to English [1], we provide automatic determination of
  225. English words. However, there are some rare cases in which it is not
  226. clear whether a word is Telugu or English:
  227.  
  228. me'm ekkad'ikee poem. Sree Sree poem caduvutuu ikkad'e' unt'aam
  229.  
  230. where poem in the first instance is Telugu, in the second English.
  231. There are a few more Telugu words, which when transliterated become
  232. valid English words: are, gala, mana, nee, poem, eg. Based on their
  233. potential frequency, we treat some of them as Telugu and some as
  234. English, by default. For example, we treat "mana" as a Telugu word,
  235. and "are" as an English word, by default. What if we want to use
  236. "mana" as an English word? We simply enclose it by #s thus: # mana #.
  237. Text enclosed between #s is inverse-transliteration-invariant. That
  238. is, it will be printed as it is.
  239.  
  240. Similarly, we write _are to use "are" as a Telugu word. That is, we
  241. have a way to force Telugu using _. In other words, just as we force English
  242. words by enclosing them with #, we force certain Telugu words (rare
  243. cases) by prepending them with _ . Finally, the defaults associated to
  244. the conflicting words can be changed by the users. That is, if a user
  245. wants to change "are" default to Telugu, (s)he can do so by editing a
  246. defaults file.
  247.  
  248.  
  249. How to represent Special Characters:
  250. -----------------------------------
  251.  
  252. We noted that @, ~, ^, &, ', _, # are special characters. Suppose the text
  253. to be transliterated has these characters. How do we represent them in
  254. transliteration? We enclose them by #s. That is, # is an ESCAPE
  255. character that toggles transliteration off and on. In other words,
  256. text enclosed between #s is inverse-transliteration-invariant. It will
  257. be printed as it is.
  258.  
  259. Example: #'# prints ', ### prints #, #Hello!# prints Hello!. However,
  260. the single quote ' retains its meaning when it doesn't follow a, i, u,
  261. e, o, t, th, d, dh. Hopefully, future software, in most cases,
  262. determines automatically whether ' is a quote (punctuation mark) or
  263. whether it is a suffix.
  264.  
  265.  
  266. Line-breaks and Verse Environment:
  267. ---------------------------------
  268.  
  269. When typing we may or may not hit return. The `return' key strokes in
  270. the input file have nothing to do with where the line breaks in the
  271. output (except in the verse environment. See below). We start new
  272. paragraphs after a blank line. There is a verse environment, delimited
  273. by |'s, where 'return' keystroke means line break in the output
  274. (equivalent to \obeylines in TeX).
  275.  
  276.  
  277. Examples:
  278. --------
  279.  
  280. English meaning Transliteration, with alternatives
  281.  
  282. uncle maama, ma'ma
  283. ant cheema, ceema, chiima, ciama
  284. monkey kOti, koati, koeti, ko'ti
  285. play aaTa, aat'a
  286. old paata
  287. important mukhyam, muKyam
  288. saw (n) rampam, ranpam
  289. eggplant vankaaya, vamkaaya
  290. order aaj~na, aajna
  291.  
  292. Examples containing English words:
  293.  
  294. Nobody is doing that nowadays and'ee, e'mant'aaru?
  295.  
  296. Modern culture loe TV, videos part and parcel ayipoeyaayand'ee!
  297.  
  298. Examples containing English words written in Telugu:
  299.  
  300. krist'afar kaad'vel aa maat'a eppud'oe ceppad'u.
  301. san^set' bulevaard' meeda oka kaameraa shaap undi.
  302.  
  303. The following is an example file.
  304. -------------------------------------------------------------------
  305. Free verse movement was spearheaded by Kundurti Anjaneyulu . The
  306. movement can be traced back to the 1930s, but it really took off only
  307. recently. The eighties have seen a number of good Telugu poets writing
  308. excellent free verse. But free verse is not necessarily easily
  309. understood. The reason for this is that modern poetry, like modern
  310. life, is complex. While using increasingly complex imagery, modern
  311. poetry also tends to shift its frame of reference to outside, rather
  312. than keeping it inside. Poetry of # Nannaya # and # Peddana # can be
  313. understood (with the help of a dictionary) without references to the
  314. outside society or history. Contrast this with the poetry of T. S.
  315. Eliot. However, this shift is a hallmark of modern poetry, and not
  316. something peculiar to free verse.
  317.  
  318. Some examples of modern free verse follow. In the first example, the
  319. poet expresses his closeness to soil, with which his umblical cord is
  320. still attached.
  321.  
  322. |
  323. nagna bhoommeeda nagna de'hantoe Sayaninci_nappaTi anubhavam.
  324. naa naraalu ekkad'oe bhoomi loepali poralloe modalai
  325. naaloeki vyaapinci_natt'u -
  326. bhoomi hRdayamloe janmistunna agni
  327. naa gund'egaa vikasistu_nnatt'u
  328. naakoo bhoomikee oka avinaa_bhaava sambandham
  329. ....
  330. bhoomi vittu andu_loenci ne' putt'u_kostaa.
  331. bhoomi oka naxatra pushpam
  332. andu_loenci ne' parima_Listaa
  333. bhoomi oka nayanam andu_loenci ne' dRshTi saaristaa
  334. ....
  335. |
  336.  
  337. (by K. Siva Reddy, `nagna bhoomeeda ', in
  338. a collection of his poems "mOhanaa! O mOhanaa!", 1988)
  339. --------------------------------------------------------------------
  340.  
  341. Note `Kundurti Anjaneyulu' is not enclosed by #s, whereas `Nannaya'
  342. and `Peddana' are. The reason is that software should be able to
  343. recognize modern names and handle them appropriately. However, if we
  344. write `kundurti aanjanEyulu', it will be printed in Telugu script.
  345.  
  346. Software:
  347. --------
  348.  
  349. We will present the inverse transliteration software, called Rice
  350. Inverse Transliterator (RIT), in a separate posting.
  351.  
  352.  
  353. we now state Rice Internal Representation below. This is used as a
  354. common platform for all subsequent text processing tasks such as
  355. type-setting, spell-checking. It is not necessary to know this
  356. representation for transliteration purposes. Only software enthusiasts
  357. may find this useful. Others may skip this section.
  358.  
  359.  
  360. The Rice Internal Representation
  361. --------------------------------
  362.  
  363. Text processing becomes simpler if each Telugu character is
  364. represented by a single ASCII character. Furthermore, the internal
  365. representation serves as a canonical one-to-one mapping between Telugu
  366. alphabet and ASCII. For example, "Th" and "th'" are both represented
  367. internally by "Q". Since the internal representation is not meant to be
  368. read, only to be processed by the software, intuition does not play a
  369. role here. The Rice Internal Representation follows.
  370.  
  371. a A i I u U R H e E y o O w
  372.  
  373. k K g G V
  374. c C j J W
  375. T Q D Z N
  376. t q d z n
  377. p f b B m
  378.  
  379.  
  380. Y r l v S P s h L x F
  381.  
  382.  
  383. The Special Characters:
  384.  
  385. sunnaa = M
  386. visarga = X
  387. alu = ASCII(1)
  388. aloo = ASCII(2)
  389. arasunna = ASCII(5)
  390. avagraha = ASCII(6)
  391. na pollu = ASCII(11)
  392. Syllable break = ASCII(30)
  393. tcha = ^P ASCII (16)
  394. tja = ^Y ASCII (25)
  395.  
  396.  
  397. Since the internal representation is not intended to be read by
  398. humans, we need to be able to produce a human readable representation
  399. from this. In case we need to do so, we represent the information
  400. using Table 1, given in the beginning of this document.
  401.  
  402.  
  403. Reference: Ananda Kishore, "On Roman Transliteration of Telugu,"
  404. soc.culture.indian.telugu, revised after posting.
  405.  
  406.  
  407. - Ananda Kishore
  408. Rama Rao Kanneganti
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement