Investigating the Impact of Crosslingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

Farooq, Muhammad Umar; Hain, Thomas

doi:10.21437/interspeech.2022-10916

Cited by 3 publications

(3 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The weights are assigned to the fusing languages on the basis of similarity of the source and the target language. The study on cross-lingual acoustic-phonetic similarities using the same mapping network approach observes that the entropy of a <source, target> mapping network shows the language similarities [28]. The same similarity measure is used along with mapping network accuracy to assign the weights.…”

Section: Acoustic Model Fusionmentioning

confidence: 99%

“…The number of shared phonemes is not a reliable metric to measure language similarities and each participating language in a multilingual system has a different similarity with the target language. Even the balanced language data sampling can cause degradation or improvement due to internal acoustic-phonetic unbalancing [28]. It demands very controlled language mixing for a target language ASR.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Narayana¹,

Hain²

2022

Interspeech 2022

View full text Add to dashboard Cite

Multilingual speech recognition has drawn significant attention as an effective way to compensate data scarcity for lowresource languages. End-to-end (e2e) modelling is preferred over conventional hybrid systems, mainly because of no lexicon requirement. However, hybrid DNN-HMMs still outperform e2e models in limited data scenarios. Furthermore, the problem of manual lexicon creation has been alleviated by publicly available trained models of grapheme-to-phoneme (G2P) and text to IPA transliteration for a lot of languages. In this paper, a novel approach of hybrid DNN-HMM acoustic models fusion is proposed in a multilingual setup for the low-resource languages. Posterior distributions from different monolingual acoustic models, against a target language speech signal, are fused together. A separate regression neural network is trained for each source-target language pair to transform posteriors from source acoustic model to the target language. These networks require very limited data as compared to the ASR training. Posterior fusion yields a relative gain of 14.65% and 6.5% when compared with multilingual and monolingual baselines respectively. Cross-lingual model fusion shows that the comparable results can be achieved without using posteriors from the language dependent ASR.

show abstract

Section: Acoustic Model Fusionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Narayana¹,

Hain²

2022

Interspeech 2022

View full text Add to dashboard Cite

show abstract

“…Recently, we have proposed a technique to learn crosslingual acoustic-phonetic similarities on phoneme level [23] which has been used for multilingual and cross-lingual acoustic model fusion [24]. A model is trained to learn mappings from a source language ASR output posterior distributions to that of the target language ASR.…”

Section: Introductionmentioning

confidence: 99%

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

Hain¹

2023

Interspeech 2023

View full text Add to dashboard Cite

show abstract

MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition

Farooq,

Ahmad,

Hain

2023

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

Investigating the Impact of Crosslingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

Cited by 3 publications

References 19 publications

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition

Contact Info

Product

Resources

About