Advertisement
RandomGuy32

Concerning *That* Article

Oct 29th, 2017
1,156
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.36 KB | None | 0 0
  1. The title of the article is already untrue. Unicode contains everything needed to write modern Bangla - including the author's entire name - and has done so for years. The last time a Bengali character that is relevant to present-day users was added was BENGALI LETTER KHANDA TA in Unicode 4.1 (March 2005). The repertoire has been complete since then; all subsequent additions only affected historic and archaic texts. What the author *really* complains about is that Unicode commonly represents things that we would call "characters" as a sequence of several codepoints, and one of these things is Indic half-consonants, which are usually a sequence of the virama, the letter, and the zero width joiner. This is not an oversight; this is how every single script in Unicode works, not just Bengali, and in fact this specific instance only exists because Unicode is backwards-compatible with ISCII, an Indian standard that was developed in India by Indians to represent the various Indic scripts, one of them Bengali. Most Indic scripts in Unicode are just ISCII shifted to some other code position and script-specific characters filling the gaps.
  2.  
  3. It is true that the Unicode Technical Committee itself is quite homogenous, but it doesn't matter because they are not the ones who come up with encoding models. Proposals for new scripts and characters are researched and submitted by native users and/or well-respected experts. They provide all the information that is necessary for a successful implementation, and the UTC makes sure that everything is okay and that there aren't any open questions. And if everything is fine, they then publish the data files and charts based on what the experts considered the best possible solution. Not to mention that Unicode is developed in tandem with ISO/IEC 10646, and the repsonsible working group (ISO/IEC JTC1/SC2/WG2) consists of national bodies from all over the world (https://en.wikipedia.org/wiki/ISO/IEC_JTC_1/SC_2#Member_countries).
  4.  
  5. The rules for Han unification were devised by Chinese and Japanese experts. Han characters are managed by the Ideographic Rapporteur Group, of which Unicode is just one member alongside every single country that uses or used Han. The IRG still uses the unification rules to encode new sets. You can look at their group photos to make sure that they aren't all white men: http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg49/IRG49.htm That being said, I personally don't necessarily agree with each and every unification and disunification. Sometimes they go too far and sometimes not far enough.
  6.  
  7. I don't really have an excuse for emoji. The original two sets (Japanese carrier in 2010 and Wingdings in 2014) were fully justified, as both source sets had been in widespread use for decades prior to their incorporation in Unicode, but everything after that is just stupid. The Fitzpatrick modifiers also should not have been encoded. Vendors should just have changed their fonts to no longer show human emoji with realistic skin tones.
  8.  
  9. High membership fees make perfect sense because this is an international industry standard used by literally billions of computers world wide. Submitting proposals and feedback is free and accessible to everyone with an e-mail address; I myself have done so many times. If the author thought that there was something wrong with Unicode they could have told the Consortium directly rather than spreading misinformation.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement