A first speech recognition system for Mandarin-English code-switch conversational speech

Vu, Ngoc Thang; Lyu, Dau-Cheng; Weiner, Jochen; Telaar, Dominic; Schlippe, Tim; Blaicher, Fabian; Chng, Eng Siong; Schultz, Tanja; Li, Haizhou

doi:10.1109/icassp.2012.6289015

Cited by 120 publications

(93 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to the well-established research line in linguistics, implications of CS and other kinds of language switches for speechto-text systems have recently received some research interest, resulting in some robust acoustic modeling [1][2][3][4][5] and language modeling [6][7][8] approaches for CS speech. Language identification (LID) is a relevant task for the automatic speech recognition (ASR) of CS speech [9][10][11][12].…”

Section: Introductionmentioning

confidence: 99%

Code-switching detection using multilingual DNNS

Yılmaz

Heuvel

Leeuwen

2016

2016 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

Automatic speech recognition (ASR) of code-switching speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we investigate the feasibility of using multilingually trained deep neural networks (DNN) for the ASR of Frisian speech containing code-switches to Dutch with the aim of building a robust recognizer that can handle this phenomenon. For this purpose, we train several multilingual DNN models on Frisian and two closely related languages, namely English and Dutch, to compare the impact of single-step and two-step multilingual DNN training on the recognition and code-switching detection performance. We apply bilingual DNN retraining on both target languages by varying the amount of training data belonging to the higher-resourced target language (Dutch). The recognition results show that the multilingual DNN training scheme with an initial multilingual training step followed by bilingual retraining provides recognition performance comparable to an oracle baseline recognizer that can employ language-specific acoustic models. We further show that we can detect code-switches at the word level with an equal error rate of around 17% excluding the deletions due to ASR errors.

show abstract

Section: Introductionmentioning

confidence: 99%

Code-switching detection using multilingual DNNS

Yılmaz

Heuvel

Leeuwen

2016

2016 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

show abstract

“…That is, when aligning recognition results with the reference transcriptions, insertions, deletions, substitutions were evaluated respectively for each language and summed up for overall evaluation. The basic unit for alignment is character for Mandarin and word for English [3] [12], so the accuracies reported here are with respect to characters for Mandarin and to words for English.…”

Section: Experiments Setupmentioning

confidence: 99%

“…We used the KneserNey tri-gram model, started with a background model and then adapted with the transcription of the training set for the target lecture here. The way the recognition accuracy was evaluated followed the earlier works [3], [12]. That is, when aligning recognition results with the reference transcriptions, insertions, deletions, substitutions were evaluated respectively for each language and summed up for overall evaluation.…”

Section: Experiments Setupmentioning

confidence: 99%

Minimum Phone Error model training on merged acoustic units for transcribing bilingual code-switched speech

Yeh

Lin

Lee

2012

2012 8th International Symposium on Chinese Spoken Language Processing

View full text Add to dashboard Cite

This paper proposes to perform Minimum Phone Error (MPE) model training on merged acoustic units for transcribing Mandarin-English code-switched lectures with highly imbalanced language distribution. Some of the acoustic events in Mandarin and English may have very similar characteristics, so the states or Gaussian mixtures representing them can be merged with identical shared parameters. When MPE is performed afterwards, these merged identical states or Gaussian mixtures can form a compact acoustic unit set. In this way MPE can better discriminate the acoustic units of both languages, because similar units are merged while distinct units are differentiated. Significant improvements in recognition accuracy were observed in the preliminary experiments on real-world bilingual code-switched lecture corpus recorded at National Taiwan University.

show abstract

“…They discover that clustering all foreign words into their POS classes leads to the best performance. In (Li et al, 2012;Li et al, 2013), the authors propose to integrate the equivalence constraint into language modeling for Mandarin and English CodeSwitching speech recorded in Hong Kong.…”

Section: Related Workmentioning

confidence: 99%

Exploration of the Impact of Maximum Entropy in Recurrent Neural Network Language Models for Code-Switching Speech

Vu¹,

Schultz²

2014

Proceedings of the First Workshop on Computational Approaches to Code Switching

Self Cite

View full text Add to dashboard Cite

This paper presents our latest investigations of the jointly trained maximum entropy and recurrent neural network language models for Code-Switching speech. First, we explore extensively the integration of part-of-speech tags and language identifier information in recurrent neural network language models for CodeSwitching. Second, the importance of the maximum entropy model is demonstrated along with a various of experimental results. Finally, we propose to adapt the recurrent neural network language model to different Code-Switching behaviors and use them to generate artificial Code-Switching text data.

show abstract

A first speech recognition system for Mandarin-English code-switch conversational speech

Cited by 120 publications

References 6 publications

Code-switching detection using multilingual DNNS

Code-switching detection using multilingual DNNS

Minimum Phone Error model training on merged acoustic units for transcribing bilingual code-switched speech

Exploration of the Impact of Maximum Entropy in Recurrent Neural Network Language Models for Code-Switching Speech

Contact Info

Product

Resources

About