thai/thctype.h(3)
NAME
thai/thctype.h
Detailed Description
Thai character classifications.
- The Thai Standard Industrial Standards Institute (TIS) de
- fined the Thai character set for using with computer named
- TIS-620. This character set is 8-bit encoded including both En
- glish and Thai characters. Aliases of TIS-620 are TIS620,
- TIS620-0, TIS620.2529-1, TIS620.2533-0 and ISO-IR-166.
- The followings are the enconding values in hexadecimal,
- unicode values and their names.
0x00 <U0000> NULL (NUL)
0x01 <U0001> START OF HEADING (SOH)
0x02 <U0002> START OF TEXT (STX)
0x03 <U0003> END OF TEXT (ETX)
0x04 <U0004> END OF TRANSMISSION (EOT)
0x05 <U0005> ENQUIRY (ENQ)
0x06 <U0006> ACKNOWLEDGE (ACK)
0x07 <U0007> BELL (BEL)
0x08 <U0008> BACKSPACE (BS)
0x09 <U0009> CHARACTER TABULATION (HT)
0x0A <U000A> LINE FEED (LF)
0x0B <U000B> LINE TABULATION (VT)
0x0C <U000C> FORM FEED (FF)
0x0D <U000D> CARRIAGE RETURN (CR)
0x0E <U000E> SHIFT OUT (SO)
0x0F <U000F> SHIFT IN (SI)
0x10 <U0010> DATALINK ESCAPE (DLE)
0x11 <U0011> DEVICE CONTROL ONE (DC1)
0x12 <U0012> DEVICE CONTROL TWO (DC2)
0x13 <U0013> DEVICE CONTROL THREE (DC3)
0x14 <U0014> DEVICE CONTROL FOUR (DC4)
0x15 <U0015> NEGATIVE ACKNOWLEDGE (NAK)
0x16 <U0016> SYNCHRONOUS IDLE (SYN)
0x17 <U0017> END OF TRANSMISSION BLOCK (ETB)
0x18 <U0018> CANCEL (CAN)
0x19 <U0019> END OF MEDIUM (EM)
0x1A <U001A> SUBSTITUTE (SUB)
0x1B <U001B> ESCAPE (ESC)
0x1C <U001C> FILE SEPARATOR (IS4)
0x1D <U001D> GROUP SEPARATOR (IS3)
0x1E <U001E> RECORD SEPARATOR (IS2)
0x1F <U001F> UNIT SEPARATOR (IS1)
0x20 <U0020> SPACE
0x21 <U0021> EXCLAMATION MARK
0x22 <U0022> QUOTATION MARK
0x23 <U0023> NUMBER SIGN
0x24 <U0024> DOLLAR SIGN
0x25 <U0025> PERCENT SIGN
0x26 <U0026> AMPERSAND
0x27 <U0027> APOSTROPHE
0x28 <U0028> LEFT PARENTHESIS
0x29 <U0029> RIGHT PARENTHESIS
0x2A <U002A> ASTERISK
0x2B <U002B> PLUS SIGN
0x2C <U002C> COMMA
0x2D <U002D> HYPHEN-MINUS
0x2E <U002E> FULL STOP
0x2F <U002F> SOLIDUS
0x30 <U0030> DIGIT ZERO
0x31 <U0031> DIGIT ONE
0x32 <U0032> DIGIT TWO
0x33 <U0033> DIGIT THREE
0x34 <U0034> DIGIT FOUR
0x35 <U0035> DIGIT FIVE
0x36 <U0036> DIGIT SIX
0x37 <U0037> DIGIT SEVEN
0x38 <U0038> DIGIT EIGHT
0x39 <U0039> DIGIT NINE
0x3A <U003A> COLON
0x3B <U003B> SEMICOLON
0x3C <U003C> LESS-THAN SIGN
0x3D <U003D> EQUALS SIGN
0x3E <U003E> GREATER-THAN SIGN
0x3F <U003F> QUESTION MARK
0x40 <U0040> COMMERCIAL AT
0x41 <U0041> LATIN CAPITAL LETTER A
0x42 <U0042> LATIN CAPITAL LETTER B
0x43 <U0043> LATIN CAPITAL LETTER C
0x44 <U0044> LATIN CAPITAL LETTER D
0x45 <U0045> LATIN CAPITAL LETTER E
0x46 <U0046> LATIN CAPITAL LETTER F
0x47 <U0047> LATIN CAPITAL LETTER G
0x48 <U0048> LATIN CAPITAL LETTER H
0x49 <U0049> LATIN CAPITAL LETTER I
0x4A <U004A> LATIN CAPITAL LETTER J
0x4B <U004B> LATIN CAPITAL LETTER K
0x4C <U004C> LATIN CAPITAL LETTER L
0x4D <U004D> LATIN CAPITAL LETTER M
0x4E <U004E> LATIN CAPITAL LETTER N
0x4F <U004F> LATIN CAPITAL LETTER O
0x50 <U0050> LATIN CAPITAL LETTER P
0x51 <U0051> LATIN CAPITAL LETTER Q
0x52 <U0052> LATIN CAPITAL LETTER R
0x53 <U0053> LATIN CAPITAL LETTER S
0x54 <U0054> LATIN CAPITAL LETTER T
0x55 <U0055> LATIN CAPITAL LETTER U
0x56 <U0056> LATIN CAPITAL LETTER V
0x57 <U0057> LATIN CAPITAL LETTER W
0x58 <U0058> LATIN CAPITAL LETTER X
0x59 <U0059> LATIN CAPITAL LETTER Y
0x5A <U005A> LATIN CAPITAL LETTER Z
0x5B <U005B> LEFT SQUARE BRACKET
0x5C <U005C> REVERSE SOLIDUS
0x5D <U005D> RIGHT SQUARE BRACKET
0x5E <U005E> CIRCUMFLEX ACCENT
0x5F <U005F> LOW LINE
0x60 <U0060> GRAVE ACCENT
0x61 <U0061> LATIN SMALL LETTER A
0x62 <U0062> LATIN SMALL LETTER B
0x63 <U0063> LATIN SMALL LETTER C
0x64 <U0064> LATIN SMALL LETTER D
0x65 <U0065> LATIN SMALL LETTER E
0x66 <U0066> LATIN SMALL LETTER F
0x67 <U0067> LATIN SMALL LETTER G
0x68 <U0068> LATIN SMALL LETTER H
0x69 <U0069> LATIN SMALL LETTER I
0x6A <U006A> LATIN SMALL LETTER J
0x6B <U006B> LATIN SMALL LETTER K
0x6C <U006C> LATIN SMALL LETTER L
0x6D <U006D> LATIN SMALL LETTER M
0x6E <U006E> LATIN SMALL LETTER N
0x6F <U006F> LATIN SMALL LETTER O
0x70 <U0070> LATIN SMALL LETTER P
0x71 <U0071> LATIN SMALL LETTER Q
0x72 <U0072> LATIN SMALL LETTER R
0x73 <U0073> LATIN SMALL LETTER S
0x74 <U0074> LATIN SMALL LETTER T
0x75 <U0075> LATIN SMALL LETTER U
0x76 <U0076> LATIN SMALL LETTER V
0x77 <U0077> LATIN SMALL LETTER W
0x78 <U0078> LATIN SMALL LETTER X
0x79 <U0079> LATIN SMALL LETTER Y
0x7A <U007A> LATIN SMALL LETTER Z
0x7B <U007B> LEFT CURLY BRACKET
0x7C <U007C> VERTICAL LINE
0x7D <U007D> RIGHT CURLY BRACKET
0x7E <U007E> TILDE
0x7F <U007F> DELETE (DEL)
0xA1 <U0E01> THAI CHARACTER KO KAI
0xA2 <U0E02> THAI CHARACTER KHO KHAI
0xA3 <U0E03> THAI CHARACTER KHO KHUAT
0xA4 <U0E04> THAI CHARACTER KHO KHWAI
0xA5 <U0E05> THAI CHARACTER KHO KHON
0xA6 <U0E06> THAI CHARACTER KHO RAKHANG
0xA7 <U0E07> THAI CHARACTER NGO NGU
0xA8 <U0E08> THAI CHARACTER CHO CHAN
0xA9 <U0E09> THAI CHARACTER CHO CHING
0xAA <U0E0A> THAI CHARACTER CHO CHANG
0xAB <U0E0B> THAI CHARACTER SO SO
0xAC <U0E0C> THAI CHARACTER CHO CHOE
0xAD <U0E0D> THAI CHARACTER YO YING
0xAE <U0E0E> THAI CHARACTER DO CHADA
0xAF <U0E0F> THAI CHARACTER TO PATAK
0xB0 <U0E10> THAI CHARACTER THO THAN
0xB1 <U0E11> THAI CHARACTER THO NANGMONTHO
0xB2 <U0E12> THAI CHARACTER THO PHUTHAO
0xB3 <U0E13> THAI CHARACTER NO NEN
0xB4 <U0E14> THAI CHARACTER DO DEK
0xB5 <U0E15> THAI CHARACTER TO TAO
0xB6 <U0E16> THAI CHARACTER THO THUNG
0xB7 <U0E17> THAI CHARACTER THO THAHAN
0xB8 <U0E18> THAI CHARACTER THO THONG
0xB9 <U0E19> THAI CHARACTER NO NU
0xBA <U0E1A> THAI CHARACTER BO BAIMAI
0xBB <U0E1B> THAI CHARACTER PO PLA
0xBC <U0E1C> THAI CHARACTER PHO PHUNG
0xBD <U0E1D> THAI CHARACTER FO FA
0xBE <U0E1E> THAI CHARACTER PHO PHAN
0xBF <U0E1F> THAI CHARACTER FO FAN
0xC0 <U0E20> THAI CHARACTER PHO SAMPHAO
0xC1 <U0E21> THAI CHARACTER MO MA
0xC2 <U0E22> THAI CHARACTER YO YAK
0xC3 <U0E23> THAI CHARACTER RO RUA
0xC4 <U0E24> THAI CHARACTER RU
0xC5 <U0E25> THAI CHARACTER LO LING
0xC6 <U0E26> THAI CHARACTER LU
0xC7 <U0E27> THAI CHARACTER WO WAEN
0xC8 <U0E28> THAI CHARACTER SO SALA
0xC9 <U0E29> THAI CHARACTER SO RUSI
0xCA <U0E2A> THAI CHARACTER SO SUA
0xCB <U0E2B> THAI CHARACTER HO HIP
0xCC <U0E2C> THAI CHARACTER LO CHULA
0xCD <U0E2D> THAI CHARACTER O ANG
0xCE <U0E2E> THAI CHARACTER HO NOKHUK
0xCF <U0E2F> THAI CHARACTER PAIYANNOI
0xD0 <U0E30> THAI CHARACTER SARA A
0xD1 <U0E31> THAI CHARACTER MAI HAN-AKAT
0xD2 <U0E32> THAI CHARACTER SARA AA
0xD3 <U0E33> THAI CHARACTER SARA AM
0xD4 <U0E34> THAI CHARACTER SARA I
0xD5 <U0E35> THAI CHARACTER SARA II
0xD6 <U0E36> THAI CHARACTER SARA UE
0xD7 <U0E37> THAI CHARACTER SARA UEE
0xD8 <U0E38> THAI CHARACTER SARA U
0xD9 <U0E39> THAI CHARACTER SARA UU
0xDA <U0E3A> THAI CHARACTER PHINTHU
0xDF <U0E3F> THAI CHARACTER SYMBOL BAHT
0xE0 <U0E40> THAI CHARACTER SARA E
0xE1 <U0E41> THAI CHARACTER SARA AE
0xE2 <U0E42> THAI CHARACTER SARA O
0xE3 <U0E43> THAI CHARACTER SARA AI MAIMUAN
0xE4 <U0E44> THAI CHARACTER SARA AI MAIMALAI
0xE5 <U0E45> THAI CHARACTER LAKKHANGYAO
0xE6 <U0E46> THAI CHARACTER MAIYAMOK
0xE7 <U0E47> THAI CHARACTER MAITAIKHU
0xE8 <U0E48> THAI CHARACTER MAI EK
0xE9 <U0E49> THAI CHARACTER MAI THO
0xEA <U0E4A> THAI CHARACTER MAI TRI
0xEB <U0E4B> THAI CHARACTER MAI CHATTAWA
0xEC <U0E4C> THAI CHARACTER THANTHAKHAT
0xED <U0E4D> THAI CHARACTER NIKHAHIT
0xEE <U0E4E> THAI CHARACTER YAMAKKAN
0xEF <U0E4F> THAI CHARACTER FONGMAN
0xF0 <U0E50> THAI DIGIT ZERO
0xF1 <U0E51> THAI DIGIT ONE
0xF2 <U0E52> THAI DIGIT TWO
0xF3 <U0E53> THAI DIGIT THREE
0xF4 <U0E54> THAI DIGIT FOUR
0xF5 <U0E55> THAI DIGIT FIVE
0xF6 <U0E56> THAI DIGIT SIX
0xF7 <U0E57> THAI DIGIT SEVEN
0xF8 <U0E58> THAI DIGIT EIGHT
0xF9 <U0E59> THAI DIGIT NINE
0xFA <U0E5A> THAI CHARACTER ANGKHANKHU
0xFB <U0E5B> THAI CHARACTER KHOMUT- Thai characters consist of 44 consonants, vowels, tone
- marks, diacritics and Thai digits. Thai vowels are divided into 4
- groups, Leading Vowels (LV), Following Vowels (FV), Below Vowels
- (BV) and Above Vowels (AV). There are 4 tonemarks whose position
- is above a consonant. Diacritics are divided into 2 groups, Above
- Diacritics (AD) and Below Diacritics (BD).
- Libthai has defined 4 levels for the position of a charac
- ter.
- · Below level: a character is placed below the consonant.
- th_chlevel() will return the value -1 for these characters.
· Base level: this includes consonants, FV and LV. A char - acter is placed on baseline. th_chlevel() will return the value 0
- for these characters.
· Above level: a character is placed just above the conso - nant. th_chlevel() will return the value 1 for these characters.
· Top level: this includes tone marks and diacritics. For - plain character cell rendering, it is safe to put these charac
- ters at top-most level. However, some rendering engines may lower
- them down on absence of character at Above level, for typographi
- cal quality. th_chlevel() will return the value 2 for these char
- acters.
There is an extra level value 3 for certain characters - which are usually classified as characters at Above level, but
- are also allowed to be placed at Top level for some rare cases.
- Two characters fall in this category, namely MAITAIKHU and
- NIKHAHIT.
MAITAIKHU can be placed at Top level when writing some mi - nority languages such as Kuy, to shorten some syllables with com
- pound vowels, such as Sara Ia and Sara Uea. NIKHAHIT can be
- placed at Top level in Pali/Sanskrit words, to represent -ng fi
- nal sound above SARA I.
The following figure illustrates a Thai word and charac - ters' level.
--------------------------- Top(2)
------*-------------------- Top(2)
------*-------------------- Top(2)
----------------------------------------------------- Above(1)
------*---------------*---- Above(1)
---****---------------*---- Above(1)
--------------------------- Above(1)
----------------------------------------------------- Base(0)
--*---*----***-----*--*---- Base(0)
-*-*-*-*--*---*---*-*-*---- Base(0)
--**-*-*------*---**--*---- Base(0)
---**--*---*--*---*---*---- Base(0)
---**--*--*-*-*----*--*---- Base(0)
---*---*--**--*---*---*---- Base(0)
---*---*--*---*---*---*---- Base(0)
---*---*--*****---*****---- Base(0)
--------------------------- Baseline --------------------------- Below(-1)
-------------------**-*---- Below(-1)
--------------------***---- Below(-1)
--------------------------- Below(-1)- A character placed at below, above or top level is also
- called dead character. It is usually combined with a consonant,
- after a dead character is typed, the cursor will not be advanced
- to the next display cell. BV, BD, TONE, AD and AV are classified
- as dead character. SYNOPSIS
- Functions
- int th_istis (thchar_t c)
Is the character a valid TIS-620 code?
- int th_isthai (thchar_t c)
Is the character a Thai character?
- int th_iseng (thchar_t c)
Is the character an English character?
- int th_isthcons (thchar_t c)
Is the character a Thai consonant?
- int th_isthvowel (thchar_t c)
Is the character a Thai vowel?
- int th_isthtone (thchar_t c)
Is the character a Thai tone mark?
- int th_isthdiac (thchar_t c)
Is the character a Thai diacritic?
- int th_isthdigit (thchar_t c)
Is the character a Thai digit?
- int th_isthpunct (thchar_t c)
Is the character a Thai punctuation?
- int th_istaillesscons (thchar_t c)
Is the character a Thai consonant that fits thex-height?
- int th_isovershootcons (thchar_t c)
Is the character a Thai consonant with stem aboveascender?
- int th_isundershootcons (thchar_t c)
Is the character a Thai consonant with stem belowbaseline?
- int th_isundersplitcons (thchar_t c)
Is the character a Thai consonant with split partbelow baseline?
- int th_isldvowel (thchar_t c)
Is the character a Thai leading vowel?
- int th_isflvowel (thchar_t c)
Is the character a Thai following vowel?
- int th_isupvowel (thchar_t c)
Is the character a Thai upper vowel?
- int th_isblvowel (thchar_t c)
Is the character a Thai below vowel?
- int th_chlevel (thchar_t c)
Position for rendering:.
- int th_iscombchar (thchar_t c)
Is the character a combining character? Function
Documentation
- int th_chlevel (thchar_t c)
- Position for rendering:.
· 3 = above/top
· 2 = top
· 1 = above
· 0 = base
· -1 = below - int th_istis (thchar_t c)
- Is the character a valid TIS-620 code?
TIS-620 here means US-ASCII plus TIS-620 extension. Char - acter codes in CR area (0x80-0x9f), non-breaking space (0xa0),
- code gap range (0xdb-0xde and 0xfc-0xff) are excluded. Author
Generated automatically by Doxygen for libthai from the - source code.
- Version 0.1.6 6 Aug 2006