org.json.Kim
Kim makes immutable eight bit Unicode strings. If the MSB of a byte is set, then the next byte is a continuation byte. The last byte of a character never has the MSB reset. Every byte that is not the last byte has the MSB set. Kim stands for "Keep it minimal". A Unicode character is never longer than 3 bytes. Every byte contributes 7 bits to the character. ASCII is unmodified. Kim UTF-8 one byte U+007F U+007F two bytes U+3FFF U+07FF three bytes U+10FFF U+FFFF four bytes U+10FFFF Characters in the ranges U+0800..U+3FFF and U+10000..U+10FFFF will be one byte smaller when encoded in Kim compared to UTF-8. Kim is beneficial when using scripts such as Old South Arabian, Aramaic, Avestan, Balinese, Batak, Bopomofo, Buginese, Buhid, Carian, Cherokee, Coptic, Cyrillic, Deseret, Egyptian Hieroglyphs, Ethiopic, Georgian, Glagolitic, Gothic, Hangul Jamo, Hanunoo, Hiragana, Kanbun, Kaithi, Kannada, Katakana, Kharoshthi, Khmer, Lao, Lepcha, Limbu, Lycian, Lydian, Malayalam, Mandaic, Meroitic, Miao, Mongolian, Myanmar, New Tai Lue, Ol Chiki, Old Turkic, Oriya, Osmanya, Pahlavi, Parthian, Phags-Pa, Phoenician, Samaritan, Sharada, Sinhala, Sora Sompeng, Tagalog, Tagbanwa, Takri, Tai Le, Tai Tham, Tamil, Telugu, Thai, Tibetan, Tifinagh, UCAS. A kim object can be constructed from an ordinary UTF-16 string, or from a byte array. A kim object can produce a UTF-16 string. As with UTF-8, it is possible to detect character boundaries within a byte sequence. UTF-8 is one of the world's great inventions. While Kim is more efficient, it is not clear that it is worth the expense of transition.
@version 2013-04-18