Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
DOI: 10.1109/icassp.2005.1415249
|View full text |Cite
|
Sign up to set email alerts
|

Thai Automatic Speech Recognition

Abstract: We describe the development of a robust and flexible Thai Speech Recognizer as integrated into our English-Thai Speech-to-Speech translation system. We focus on the discussion of the rapid deployment of ASR for Thai under limited time and data resources, including rapid data collection issues, acoustic model bootstrap, and automatic generation of pronunciations. Issues relating to the translation and overall system will be reported elsewhere.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 16 publications
(14 citation statements)
references
References 6 publications
0
14
0
Order By: Relevance
“…Pitch, as a perceptual measurement of fundamental frequency (F0) of speech signals [1], is a powerful prosodic cue for auditory perception. Pitch features have long known to be useful for recognition of normal speech, especially for tonal languages, such as Mandarin [2,3,4], Cantonese [5,6], Vietnamese [7,8] and Thai [9,10], since pitch can serve as an informative source to distinguish different tones in tonal languages [11]. In non-tonal languages, for instance, English [12,13,14] and Japanese [15,16], it is also feasible to treat pitch as an auxiliary information by concatenating with acoustic features to improve speech recognition performance.…”
Section: Introductionmentioning
confidence: 99%
“…Pitch, as a perceptual measurement of fundamental frequency (F0) of speech signals [1], is a powerful prosodic cue for auditory perception. Pitch features have long known to be useful for recognition of normal speech, especially for tonal languages, such as Mandarin [2,3,4], Cantonese [5,6], Vietnamese [7,8] and Thai [9,10], since pitch can serve as an informative source to distinguish different tones in tonal languages [11]. In non-tonal languages, for instance, English [12,13,14] and Japanese [15,16], it is also feasible to treat pitch as an auxiliary information by concatenating with acoustic features to improve speech recognition performance.…”
Section: Introductionmentioning
confidence: 99%
“…Each vowel can carry one of five tones: low, mid, high, rising, and falling. When investigating the impact of tone information, we found no performance gain [25]. Therefore, we focused on phone sets without tone features.…”
Section: B Rapid Model Building For Asrmentioning
confidence: 99%
“…Like Chinese [5], Thai [4] and other languages in Southeast Asia, Vietnamese is a tonal, morpho-syllabic language in which each syllable is represented by a unique word unit (WU) and most WUs are also morphemes, except for some foreign words, mainly borrowed from English and French. Notice that the term WU we use here has a similar meaning to the term character in Chinese.…”
Section: Introductionmentioning
confidence: 99%
“…Each word is composed of one to several WUs with di erent meaning. For the automatic speech recognition problem, most systems for Chinese [3], Thai [4] or Vietnamese [2] share a similar approach in both acoustic modeling (AC) and language modeling (LM). Speci cally, the acoustic modeling is typically based on the decomposition of a syllable into initial and nal parts; while the language modeling is trained on WUs or words.…”
Section: Introductionmentioning
confidence: 99%