Country Report - Thailand
International Symposium on Standardization of
Multilingual Information Technology (MLIT97)
Singapore, May 26-28, 1997Thaweesak Koanantakool
Ph.D., Chularat Tanprasert Ph.D., and
Chonchanok Viravan Ph.D.
National Electronics and Computer Technology Center
Bangkok Thailand.
1) Diffusion of Computers in Thailand
In the past decade, the number of computers in Thailand has been vastly increased, especially in Bangkok. People use them both for business and personal use. Eventhough a lot of people used to apply it only as a typewriter, the trend has gradually changed, Thai people currently apply computers to keep a large amount of data, to process and compute data, to draw & create beautiful images, to develop many applications, and much more. As shown in the survey of the Association of Thai Computer Industry (ATCI) and Computer Association of Thailand's Vendors Group (CAT-VG), the number of computers has raised every year, especially the personal computer and its peripherals. Table 1 and Figure 1 shows that PCs share a big part of all computers in Thailand. However, the PC and computer market in Thailand is forcasted to be declined since the economic is slowdown and the uncertainty of the politics. Table 1. Shows Thailand IT Market Outlook during 1995-1997 (Value inMillion US Dollar) | Category | Actual Market 1995 | Actual Market 1996 | Forecast Market 1997 |
Value | % | Value | % | % Change | Value | % | % Change |
| | Large Scale | 92.6 | 7.85 | 99.76 | 7.11 | 7.74 | 94.68 | 6.11 | -5.09 |
| | Medium Scale | 82.76 | 7.01 | 92.80 | 6.61 | 12.12 | 101.76 | 6.57 | 9.66 |
| | Small Scale | 81.48 | 6.91 | 104.00 | 7.41 | 27.64 | 130.00 | 8.40 | 25.00 |
| | Workstation | 110.16 | 9.34 | 111.80 | 7.97 | 1.49 | 143.16 | 9.25 | 28.05 |
| | Special Purpose | 22.52 | 1.91 | 20.16 | 1.44 | -10.45 | 22.36 | 1.44 | 10.86 |
| | PC & Peripherals | 790.32 | 66.99 | 974.36 | 69.45 | 23.29 | 1056.40 | 68.23 | 8.42 |
| | Total | 1179.84 | 100 | 1402.88 | 100 | 18.91 | 1548.36 | 100 | 10.37 |
Source: ATCI & CAT-VG
Figure 1. Illustrates the ratio of computers in Thailand by value.
However, there is a coming mega project of the Education Department of Thailand.The current Education Ministry announced plans to purchase some 100,000 PCs for 14,000 primary and secondary schools. The budget for this project would be available next year. In addition, the Internet is the currently hottest topic in Thailand with an estimated 180,000 users among 16 Internet Service Providers (ISP). All IT companies are trying to do something in taking the advantage of the Internetpopularity. And more companies are connecting their PCs into a network and an inter-network. This causes the use of PCs to be more popular. 2) Thai Language Solutions
Thai language was created 700 years ago. It has been gradually changed. Some characters are not used anymore. The contemporary Thai language has 44 consonants, 32 vowels, 10 Thai digits, 4 tone marks, and 5 special symbols as shown in Figure 2. Our writing method is multi-level: one line of Thai script can be divided into four levels as seen in Figure 3. This is totally distinct from the English one. Due to the requirement for businesses and government to operate with Thai language, most PCs in Thailand are usable with both Thai & English languages. Therefore, the display systemis required to adapt to be able to display both Thai and English in the suitable manner.
Figure 2. Illustrates all Thai characters.
Note: The third and fifth consonants (Khokhuat and Kho Khon are no longer in use. We can find them only in old Thai documents.
Figure 3. Shows the four levels of one Thai sentence.
Text display of English language is simpler than in Thai, where each line consists of characters placed next to each other from left to right. One position ("column") contains exactly one display character. When the end of line is reached, the new line is started from left to right beneath the previous line. Text terminals (as DOS screen) for Thai language is different, due to the four-level writing tradition.We have to modify display adapter hardware to make them capable of displaying Thai language properly. There have been six categories of Thai displaying techniques ever invented and put into real use. Examples of some techniques are illustrated in Figure 4.
- a)
Four level display technique: it is to put Thai character in the character generator chip
which can be used to display one Thai or English character into one display cell.
For viewing Thai sentence with the technique, some display cells are wasted,
actually, we lost 4 English lines per one Thai line. So it is called a four level display technique.
In addition, the line spacing is too large.
- b) Three level display technique:
it is the simple and convenient method for both monitors and printers.
It was popular for quite a long time on mainframe, minicomputer, and microcomputer.
However, we are still lost three English lines pereach Thai line.
- c) Three levels in two levels area: it is to allow the symbols in level 1, 2,
and 4 to use the same area. This technique requires some special mix symbols for a
combination of characters of level 1 or 2 with level 4.
- d) Five level display technique: English characters and Thai characters
of level 3 acquire 2 levels. This method makes the number of display line of both
languages become equal and the ratio of characters are more beautiful.
- e) Two planes: It allows all display cells to compose of the middle part (level 3) and also the higher and lower part (level 1, 2, 4).
- f) Graphic mode for displaying alphabets: each character can be display in any ratio depends on the using of Thai driver. It can be improved without any limitation.
The basic system for displaying Thai characters on monitors or printers has only a duty to show the right Thai characters. The remaining responsibilities are taken care by the displaying system developers by developing software to control all involved displaying equipment to be in the standard and able to work with the application programs. There are several de facto standards of Thai character codes for computers in the past. However, it remains only two popular Thai character sets recently. One is the formal standard code
TIS 620-2533 and the other is the de facto one which was implemented by the Professor Yuen Poovorawan at Kasetsart University. The differences between the twos are the location of each Thai character in the ISO 646 expanding code table and the number of Thai characters. Most of PC application software working in Thai are usually working with both standard code. The
TIS 620-2533 seems to be more popular and more standard as the time past. 3) Software Applications with Thai Language
Computers are mainly used with English data at the beginning. Therefore, all designs are suitable for using the data in English or others similar language. Thai language has some similar and different parts from the English in the displaying input and output of computers, so some adaptations are required to take place for displaying Thai in any application software. The different aspects of Thai and English are the different alphabet sets and all English characters are placed in one level but Thai characters may be placed into four level for each Thai sentence. However, there are several similar aspects also. Both Thai and English languages create a word by mixing a fixed number of characters, the total number of characters in both languages are less than 256 characters so they both can use one byte memory to represent all characters, and characters in both language are invented by ordering the pixels together. a). Four level display sample | | b). Three level display sample. |
c).Two level display sample. | | d).Two planes sample. |
e). Five level display sample. |
Figure 4. Illustrates some displaying techniques.
Since both languages represent each character by using one byte code, most application software that have been made for English data can be managed to work with Thai data also. Those software usually do not need to know that they are working with Thai language at the present, the inserted programs are required in three points. First point is the keyboard input part, second one is the output to the monitor part, and the last point is the output to the printer part. All three parts are usually being implemented as the Thai drivers, for example, VTHAI, TSM, and DYNA. However, some application software manages the last part of each ASCII code for controlling purpose, they require that localization be modified for using with Thai data.
Nevertheless, some application software are able to do something more than just displaying the Thai data. The capabilities are sorting, searching, cutting words of Thai language (there is no space between each word in Thai language). Samples of softwares that can manage Thai language can be categorized as: - Office System: These are word processors, spreadsheets, presentations, accounting, etc. All of these software that being in use by Thai people at the present can display Thai language quite well and some of them are able to do something more with the Thai data such as sorting and searching. It depends on the characteristics of the software. Samples are Microsoft Office, Lotus Smartsuit, and ThaiSoft's Geneus.
- Database Management System: Most of DBMS in Thailand are able to work with Thai language such as Oracles, Progress, Foxpro, Microsoft Access, etc.
- CAI applications: All imported authoring tools that being sold in Thailand at the present are able to work with Thai language such as Authorware, ToolBook, Quest. There are also some authoring tools that developed by Thai people. They are ThaiTAS (Thai Authoring System) and Chula CAI.
- Optical Character Recognition: There has been three Thai & English OCR softwares available in the market since last year. The performance of each one is around 90 % and up recognition rate. They are ArnThai (by NECTEC), ThaiOCR and T-Rec.
- Search Tool for Thai: There are Thai search engines available for several years. However, its performance usually depends on the capability of the cutting Thai word algorithm and there are no 100% correction rate of the cutting word method. Many researchers have put an effort in doing the research about word-break and search algorithms recently. The soft computing technique and the message context are the most two technologies that have been applied to solve the problem. Nevertheless, there is a bilingual (Thai & English) search engine available for us to test at URL: https://kanchanapisek.or.th/kp6/GENERAL/search/search.htm.
4) Multi-Lingual Solutions
The bilingual system, with both Thai and English languages, was introduced around the seventies. More applications with both languages interfaces become available. Users can switch from one language to another very easily by using mouse to clickat the icon of Thai or English, or pushing one key in the keyboard.
There are only a few multilingual systems being used in Thailand. Included languages are Thai and English and (Japanese or Chinese). This is possible because of a specific software. Moreover, the Internet is a hottest computer topic at the present and reachable information by the Internet are in many languages so it is expected that multilingual solution should be more interesting of Thai people in the near future.
As an example, a lot of Thai people who can read/understand Japanese, they are able to view Japanese on the net by using the capability that has been provided at Shodouka Launchpad (URL: https://www.lfw.org/shodouka). Shodouka site lets us to view the Kana and Kanji in the Japanese document on the Internet. For Thai language, there is a site that has provide techniques for viewing Thai documents on the net. It is at URL: https://thaigate.rd.nacsis.ac.jp. This site is provided information for reading Thai document on the web for Netscape and Internet Explorer (https:// www.microsoft.com/thailand/thaifont.htm). For Netscape as an example , users are required to obtain Thai fonts, then install those fonts to Windows 95 and last tell Netscape to use the fonts in order to view Thai by Netscape on Windows 95. 5) Availability and Popularity of Formal Standards to Support Thai Language
-
TIS 620-2533 Standard for Thai Character codes for Computers The first version was established as TIS 620-2529 in June, 1986. But it still lacks of details about code-point and name of each character so the second edition is required. The new edition is declared to use as the standard in August, 1990. This standard is determined by following the rules in ISO 646-1983 (Information processing-ISO 7-bit coded character set for information interchange) and GX20-1850-4 (IBM System/370 Reference Summary). The standard code is set by extending the table of ISO 646 and also the code in table of EBCDIC of Thai language. In this standard, there are 44 consonants, 20 vowels, 4 tone marks, 10 Thai digits, 9 special symbols.
-
TIS 820-2538 Layout of Thai Characters on Computer Keyboards The old layout of Thai characters on computer keyboards was set in February, 1989. For more suitable in using, the old version is cancelled and the new one is set on September, 1995. This standard follows the ISO 1091:1977 (Typewriters - Layout of prining and function keys), ISO 2530:1975 (Keyboard for international information processing interchange using the ISO 7-bit coded character set - Alphanumeric area), and TIS 620-2533.
-
TIS 988-2533 Recommendation for Thai combined character codes and symbols for line graphics for dot-matrix printers For printing Thai documents by using TIS 620-2533 in dot-matrix printers, it requires to run four times for printing each Thai line. In order to increase the speed of printing, someone had combined the Thai vowels in level-2 with tone marks and some special symbols on level-2 to be as one so the dot-matrix printer prints those characters together at the same time. So we require only three passes for printing one Thai line. Combining characters can be done in different ways. In the intelligent printer which contains its own CPU, each single character can be combined just before sending to the printer but in the dumb printer, the hardware of printing part is modified. The codes of combined characters are desired in several patterns which cause the incompatibility among different printer brand names. Therefore, this standard is established in August, 1990 to determine the standard of combined characters so several printer brand names are able to use with the computer. The total number of combined characters is 25. This standard is set by using some information from the research work done by one of the authors at the National Electronics and Computer Technology Center (NECTEC), Thailand.
-
TIS 1074-2535 Standard For 6-bit Teletype Codes Receiving and sending of telegraph and telex can be done by using the International Telegraph Alphabet No. 2 (ITA 2) or CCITT Recommendation F.1 which include only English characters and the length is only 5 bits. In Thailand, this activity requires to perform with both Thai and English data which can be achieved by using 6-bit. This standard is announced in January, 1992 to let all teletypes in Thailand using the same standard code.
-
TIS 1075-2535 Standard For Conversion Between Computer Codes and 6-bit Teletype Codes The computer has been used with the teletype at the present. For convenient of the transformation of codes in each system, this standard is designed and announced for using in January, 1992. This conversion between computer codes and 6-bit teletype codesare desired by following the TIS 620-2533, TIS 1074-2535, ISO 646:1983, IBM System/370, and GX 20-1850-4.
-
TIS 1099-2535 Standard For Province Identification Codes for Data Interchange Currently, there are a lot of data interchange involving the provinces in Thailand but organizations use different provincial codes. Therefore, they cannot exchange data to each other immediately. The standard for province identification codes is taken place by using the codes that have been used by Ministry of Interior, Communication Authority of Thailand, and The Revenue Department of Ministry of Finance. The standard was announced to use in August, 1992. Each provincial code contains 2 digits. There are 73 provincial codes in this version and it has been provided 26 more codes for the new provinces that will be set after the announcement of the standard.
-
TIS 1111-2535 Standard For Representation of Dates and Times Date and time display is very important in the data processing and general communication. Therefore, this standard is announced in September, 1992 by applying the standard from ISO 8601:1988 (Data elements and interchange formats - Information interchange - Representation of dates and times).
More standards about Thai language for computing application are currently developing. Some standard drafts are included; the Thai Input/Output methods, the Electronic Data Interchange: Syntax rules, the Electronic Data Interchange, Data Elements Directory, and the Electronic Data Interchange: Data Segments Directory. Some standards are planned to be drafted soon; Thai Fonts for Information Interchange and Standard Thai Glyphs for Optical Character Recognition. 6) Current Topics of the Thai Language Support
Although there are two main popular Thai character codes for using in the software, TIS 620-2533 standard code has seemed to be used in more products so far. Most recent classical topics about Thai language are - Thai Optical Character Recognition (Thai OCR),
- Thai Text-to-Speech,
- Thai Speech-to-Text,
- Thai Intelligent Word Cutting,
- Thai Search Engine, and
- Thai - English Machine Translation.
There are many educational institutes and private laboratories try to solve all these interesting problems. So far, only part of solution are found and the research still keeps going. We hope that someday we reach 99.99 % recognition rate of Thai OCR, the stable text-to-speech and speech-to-text softwares, the perfect word cutting and searching algorithms, and well machine translation. 7) Conclusion
In Thailand, PCs have tendency to be more popular, especially outside Bangkok. One reason is the Internet. Some Internet providers of Thailand have focused their customers at the outside Bangkok's residents. So those people would require to havetheir own computers and softwares that can work well with Thai language. Almost all information systems do obtain the Thai language capability. Nevertheless, Thai are different from English in some aspects so there are two ways to make the application software able to work with Thai language. One is the implementation of Thai driver to interface with the application and the other is to localize the application. This depends on the features and properties of each software. Currently there are more Thai standard codes available. In the past, we usually used the de facto standard from several organization which were not the same. So the Thai Industrial Standard Institute (https://www.tisi.go.th) has appointed TC-536 (Technical Committee on IT) to examine the existing several standard codes for Thai characters used in computers. Internationalization (I18N) approach for multi-lingual processing on IT has catched on in Thailand as early as 1993. The Thai API Consortium (TAPIC) worked out Syntax rules and drafted the language-sensitive issues according to the "Locale" descriptions of POSIX and X/Open. The standard character code, TIS 620-2533, was also proposed to ISO for being included into the international standard ISO 8859 series (8-bit character codes with extension for multi-lingual handling). Thai standard character code is now designated ISO 8859-11.TIS 620-2533 is also part of ISO 10646 (version 2) and UNICODE with the prefix byte=0E.
References
- The Association of Thai Computer Industry (ATCI) and the Computer Association of Thailand's Vendors Group (CAT-VG), IT Market Outlook for 1997, March, 1997.
- Surayuth Boonmatat, IT Standardization in Thailand, Country report for the 10th Asian Forum for Standardization of Information Technology, Republic of Korea, October, 1996.
- Thai Industrial Standards Institute, TIS 820-2538: Layout of Thai Character Keys on Computer Keyboards, September, 1995. (in Thai)
- Thai Industrial Standards Institute, TIS 1099-2535: Standard for Province Identification Codes for Data Interchange, August, 1992. (in Thai)
- Thai Industrial Standards Institute, TIS 1075-2535: Standard for Conversion Between Computer Codes and 6-bit Teletype Codes, January, 1992. (in Thai)
- Thai Industrial Standards Institute, TIS 1074-2535: Standard for 6-bit Teletype Codes, January, 1992. (in Thai)
- Thai Industrial Standards Institute, TIS 1111-2535: Standard for Representation of Dates and Times, September, 1992. (in Thai)
- Thaweesak Koanantakool and the Thai Application Programming Interface Consortium, Thai language & Computers: Basic Standard Development of Information Technology of Thailand, 1991. (in Thai)
- Thai Industrial Standards Institute, TIS 620-2533: Standard for Thai Character Codes for Computers, August, 1990. (in Thai)
- Thai Industrial Standards Institute, TIS 988-2533: Recommendation for Thai Combined Character Codes and Symbols for Line Graphics for Dot-Matrix Printers, August, 1990. (in Thai)
Appendix I : Layout of Thai Character Keys on Computer Keyboards(TIS 820-2533)
Appendix II : Standard for Thai Character Codes for Computers (TIS 620-2533)
Appendix III : Recommendation for Thai combined character codes and symbols for line graphics for dot-matrix printers (TIS 988-2533)