Thai Automatic Speech Recognition

Suebvisai, Sinaporn; Charoenpornsawat, Paisarn; Black, Alan W.; Woszczyna, Monika; Schultz, Tanja

doi:10.1109/icassp.2005.1415249

Cited by 16 publications

(14 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pitch, as a perceptual measurement of fundamental frequency (F0) of speech signals [1], is a powerful prosodic cue for auditory perception. Pitch features have long known to be useful for recognition of normal speech, especially for tonal languages, such as Mandarin [2,3,4], Cantonese [5,6], Vietnamese [7,8] and Thai [9,10], since pitch can serve as an informative source to distinguish different tones in tonal languages [11]. In non-tonal languages, for instance, English [12,13,14] and Japanese [15,16], it is also feasible to treat pitch as an auxiliary information by concatenating with acoustic features to improve speech recognition performance.…”

Section: Introductionmentioning

confidence: 99%

On the Use of Pitch Features for Disordered Speech Recognition

Liu

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

Pitch features have long been known to be useful for recognition of normal speech. However, for disordered speech, the significant degradation of voice quality renders the prosodic features, such as pitch, not always useful, particularly when the underlying conditions, for example, damages to the cerebellum, introduce a large effect on prosody control. Hence, both acoustic and prosodic information can be distorted. To the best of our knowledge, there has been very limited research on using pitch features for disordered speech recognition. In this paper, a comparative study of multiple approaches designed to incorporate pitch features is conducted to improve the performance of two disordered speech recognition tasks: English UASpeech, and Cantonese CUDYS. A novel gated neural network (GNN) based approach is used to improve acoustic and pitch feature integration over a conventional concatenation between the two. Bayesian estimation of GNNs is also investigated to further improve their robustness.

show abstract

Section: Introductionmentioning

confidence: 99%

On the Use of Pitch Features for Disordered Speech Recognition

Liu

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

show abstract

“…Each vowel can carry one of five tones: low, mid, high, rising, and falling. When investigating the impact of tone information, we found no performance gain [25]. Therefore, we focused on phone sets without tone features.…”

Section: B Rapid Model Building For Asrmentioning

confidence: 99%

Flexible Speech Translation Systems

Schultz

Black

Vogel

et al. 2006

IEEE Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Abstract-Speech translation research has made significant progress over the years with many high-visibility efforts showing that translation of spontaneously spoken speech from and to diverse languages is possible and applicable in a variety of domains. As language and domains continue to expand, practical concerns such as portability and reconfigurability of speech come into play: system maintenance becomes a key issue and data is never sufficient to cover the changing domains over varying languages. In this paper, we discuss strategies to overcome the limits of today's speech translation systems. In the first part, we describe our layered system architecture that allows for easy component integration, resource sharing across components, comparison of alternative approaches, and the migration toward hybrid desktop/PDA or stand-alone PDA systems. In the second part, we show how flexibility and reconfigurability is implemented by more radically relying on learning approaches and use our English-Thai two-way speech translation system as a concrete example.

show abstract

“…Like Chinese [5], Thai [4] and other languages in Southeast Asia, Vietnamese is a tonal, morpho-syllabic language in which each syllable is represented by a unique word unit (WU) and most WUs are also morphemes, except for some foreign words, mainly borrowed from English and French. Notice that the term WU we use here has a similar meaning to the term character in Chinese.…”

Section: Introductionmentioning

confidence: 99%

“…Each word is composed of one to several WUs with di erent meaning. For the automatic speech recognition problem, most systems for Chinese [3], Thai [4] or Vietnamese [2] share a similar approach in both acoustic modeling (AC) and language modeling (LM). Speci cally, the acoustic modeling is typically based on the decomposition of a syllable into initial and nal parts; while the language modeling is trained on WUs or words.…”

Section: Introductionmentioning

confidence: 99%

Vietnamese Automatic Speech Recognition: The FLaVoR Approach

Demuynck

Compernolle

2006

Chinese Spoken Language Processing

View full text Add to dashboard Cite

Abstract. Automatic speech recognition for languages in SoutheastAsia, including Chinese, Thai and Vietnamese, typically models both acoustics and languages at the syllable level. This paper presents a new approach for recognizing those languages by exploiting information at the word level. The new approach, adapted from our FLaVoR architecture[1], consists of two layers. In the rst layer, a pure acoustic-phonemic search generates a dense phoneme network enriched with meta data. In the second layer, a word decoding is performed in the composition of a series of nite state transducers (FST), combining various knowledge sources across sub-lexical, word lexical and word-based language models. Experimental results on the Vietnamese Broadcast News corpus showed that our approach is both e ective and exible.

show abstract

Thai Automatic Speech Recognition

Cited by 16 publications

References 6 publications

On the Use of Pitch Features for Disordered Speech Recognition

On the Use of Pitch Features for Disordered Speech Recognition

Flexible Speech Translation Systems

Vietnamese Automatic Speech Recognition: The FLaVoR Approach

Contact Info

Product

Resources

About