« | Home | Categories | »

Picking apart the crashing iOS string

Posted on February 19th, 2018 at 10:29 by John Sinteur in category: News

Okay, I guess I learned a few new things today…



Indic scripts are abugidas; which means that their “letters” are consonants, which you can attach diacritics to to change the vowel. By default, consonants have a base vowel. So, for example, क is “kuh” (kə, often transcribed as “ka”), but I can change the vowel to make it के (the “ka” in “okay”) का (“kaa”, like “car”).

Usually, the default vowel is the ə sound, though not always (in Bengali it’s more of an o sound).

Because of the “default” vowel, you need a way to combine consonants. For example, if you wished to write the word “ski”, you can’t write it as स + की (sa + ki = “saki”), you must write it as स्की. What’s happened here is that the स got its vowel “killed”, and got tacked on to the की to form a consonant cluster ligature.

You can also write this as स्‌की . That little tail you see on the स is known as a “virama”; it basically means “remove this vowel”. Explicit viramas are sometimes used when there’s no easy way to form a ligature, e.g. in ङ्‌ठ because there is no simple way to ligatureify ङ into ठ. Some scripts also prefer explicit viramas, e.g. “ski” in Malayalam is written as സ്കീ, where the little crescent is the explicit virama.

In unicode, the virama character is always used to form a consonant cluster. So स्की was written as <स,  ्, क,  ी>, or <sa, virama, ka, i>. If the font supports the cluster, it will show up as a ligature, otherwise it will use an explicit virama.

For Devanagari and Bengali, usually, in a consonant cluster the first consonant is munged a bit and the second consonant stays intact. There are exceptions – sometimes they’ll form an entirely new glyph (क + ष = क्ष), and sometimes both glyphs will change (ड + ड = ड्ड, द + म = द्म, द + ब = द्ब).

Write a comment