Spoken language is the most common communication tool used in daily life. Without laying eyes on a monitor or screen, speech technology helps synthesizing human-like sounds to transfer massage from a device to perceivers. Text-to-speech synthesis system are the practical intermediate tool for controlling a synthesizer. Using only a chuck of text, the system can generate synthetic sounds to commute with subjects, for example using a text for an automatic announcement system, which is more flexible than using a prior recorded speech.
Vaja 8.0 is the newest version (2018) of Thai/English text to speech synthesizer from NECTEC. It can give speech in two voices, female and male. It comprises of all three new reengineered parts. The whole system is reorganized to be able to work as plug-in modules. All parts are able to be replaced by any up-coming engine or language. In terms of technical challenges, text pre-processing part uses syllable-like unit in text segmentation instead of word. The grapheme to phoneme conversion part uses the new grapheme to phoneme conversion (G2P 3.0) which is a fully machine-learning module instead of using syllable-pattern dictation. Finally, speech synthesizer part uses linguistics information for storing and retrieving HMM models from a tree. The product is bundled and available on both Linux and Windows.
Software features:
- Output: Thai/English synthetic reading speech
- Text pre-processing: module for reading from dictation is available.
- Speech quality sounds closer to human speech compared to that from previous version
- Voice modification: able to tune volume and speed
- API: UI and library
- Voice:
- Name: Nok (female), A (male)
- Language: Bilingual
- Sample type: PCM, 44,100 Hz, 16 bits, Mono
- Style: reading speech
Hardware requirement
- Operating system: Windows 7, Windows 8.1, Windows 10
- RAM: 1 GB or higher
- Available storage space: 100 MB or higher
- Sound card: General sound card
Examples of VAJA ver. 2- 8
Software specification:
- Supported APIs
- Provide text pre-processing tool such as grapheme-to-phoneme conversion
Application:
- Able to be plugged-in to IVR system
- Able to read e-book, email, or documents
- Able to be used in queuing system or information support system
- Able to be bundled in presentation or e-learning application
- Able to be used in any responsive device
- Able to be plugged-in screen reader
Availability:
- Licensing
Research and development team:
- Speech and Audio Technology Laboratory (SPT, HCCRU)
- email: sawit.kas[at]nectec.or.th
Contact:
- Business and Technology Transfer
- Tel: 0 2564 6900 ext 2346, 2351-2354, 2357, 2382, 2383, 2399
- email: business[at]nectec.or.th
September-11-2018 16:26