Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages

Hai, Van; Xiao, Xiong; Chng, Eng Siong; Li, Haizhou

doi:10.1587/transinf.e97.d.285

Cited by 12 publications

(7 citation statements)

References 19 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multilingual transfer in ASR often relies on using bottle-neck features (Vesely et al, 2012;Vu et al, 2012;Karafiát et al, 2018) and adapting an acoustic model trained on one language to effectively recognize the sounds of other languages (Schultz and Waibel, 2001;Le and Besacier, 2005;Stolcke et al, 2006;Tóth et al, 2008;Plahl et al, 2011;Thomas et al, 2012;Imseng et al, 2014;Do et al, 2014;Heigold et al, 2013;Scharenborg et al, 2017). However, while most work uses less than 10 languages for model training, we include up to 100 languages in training.…”

Section: Related Workmentioning

confidence: 99%

Massively Multilingual Adversarial Speech Recognition

Adams

Wiesner

Watanabe

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

We report on adaptation of multilingual endto-end speech recognition models trained on as many as 100 languages. Our findings shed light on the relative importance of similarity between the target and pretraining languages along the dimensions of phonetics, phonology, language family, geographical location, and orthography. In this context, experiments demonstrate the effectiveness of two additional pretraining objectives in encouraging language-independent encoder representations: a context-independent phoneme objective paired with a language-adversarial classification objective. 2 festvox.org/cmu_wilderness/index.html

show abstract

Section: Related Workmentioning

confidence: 99%

Massively Multilingual Adversarial Speech Recognition

Adams

Wiesner

Watanabe

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

show abstract

“…This makes a full fledged acoustic modeling process impractical for under-resourced languages. Popular approaches are to transfer well-trained acoustic models to under-resourced languages such as universal phone set [2,3], tandem approach [4][5][6], subspace GMMs (SGMMs) [7,8], Kullback-Leibler divergence HMM (KL-HMM) [9,10], crosslingual phone mapping [11][12][13] and its extension, contextdependent phone mapping [14][15][16]19].…”

Section: Introductionmentioning

confidence: 99%

Multilingual exemplar-based acoustic model for the NIST Open KWS 2015 evaluation

Hai

Xiao

et al. 2015

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Self Cite

View full text Add to dashboard Cite

In this paper, we investigate the use of the proposed non-parametric exemplar-based acoustic modeling for the NIST Open Keyword Search 2015 Evaluation. Specifically, kerneldensity model is used to replace GMM in HMM/GMM (Hidden Markov Model / Gaussian Mixture Model) or DNN in HMM/DNN (Hidden Markov Model / Deep Neural Network) acoustic model to predict the emission probability of HMM states. To get further improvement, likelihood score generated by the kernel-density model is discriminatively tuned by the score tuning module realized by a neural network. Various configurations for score tuning module have been examined to show that simple neural network with 1 hidden layer is sufficient to fine tune the likelihood score generated by the kernel-density model. With this architecture, our exemplar-based model outperforms the 9-layer-DNN acoustic model significantly for both the speech recognition and keyword search tasks. In addition, our proposed exemplarbased system provides complementary information to other systems and we can further benefit from system combination.

show abstract

“…Transactions on Information and Systems [26], and in the two conferences: IALP 2012 [27] and ISCSLP 2012 [28]. The work on applying deep neural networks on monolingual speech recognition and cross-lingual phone mapping is published in the two conferences:…”

Section: Contributionsmentioning

confidence: 99%

“…The results in this chapter have been published in: IEICE Transactions on Information and Systems [26], IALP 2012 [27], and ISCSLP 2012 [28].…”

Section: Cross-lingual Phone Mappingmentioning

confidence: 99%

See 1 more Smart Citation

Acoustic modeling for speech recognition under limited training data conditions

Hai¹

Self Cite

View full text Add to dashboard Cite

The development of a speech recognition system requires at least three resources: a large labeled speech corpus to build the acoustic model, a pronunciation lexicon to map words to phone sequences, and a large text corpus to build the language model. For many languages such as dialects or minority languages, these resources are limited or even unavailable-we label these languages as under-resourced. In this thesis, the focus is to develop reliable acoustic models for under-resourced languages. The following three works have been proposed. I would like to express my sincere thanks and appreciation to my supervisor, Dr. Chng Eng Siong (NTU), and co-supervisor, Dr. Li Haizhou (I 2 R) for their invaluable guidance, support and suggestions. Their encouragement also helps me to overcome the difficulties encountered in my research. My thanks also go to Dr. Xiao Xiong (NTU) for his help, discussions during my PhD study time. My thanks also go to my colleagues in the speech group of the International Computer Science Institute (ICSI) including Prof. Nelson Morgan, Dr. Steven Wegmann, Dr. Adam Janin, Arlo Faria for their generous help and fruitful discussions during my internship at ICSI. I also want to thank my colleagues in the speech group in NTU, for their help. I am very comfortable to collaborate with my team mates Guangpu,

show abstract

Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages

Cited by 12 publications

References 19 publications

Massively Multilingual Adversarial Speech Recognition

Massively Multilingual Adversarial Speech Recognition

Multilingual exemplar-based acoustic model for the NIST Open KWS 2015 evaluation

Acoustic modeling for speech recognition under limited training data conditions

Contact Info

Product

Resources

About