Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Narayana, Darshan Adiga Haniya; Hain, Thomas

doi:10.21437/interspeech.2022-11449

Cited by 2 publications

(4 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So, the mapping models accuracy is calculated for different values of n where n represents the number of most probable classes. Though the accuracy increases with increasing value of n, rate of change is not as much as observed in case of phonemes by [24] which implies that the performance of mapping model in case of phoneme based hybrid DNN-HMM systems has been better than that for e2e systems. Since the mapping models are trained using posterior distributions of ASR outputs, one potential reason could be the detrimental affect of speech recognition systems on the training of mapping models.…”

Section: Mapping Modelsmentioning

confidence: 84%

“…Mapping models in the previous work [24] have been trained on frame level without considering the contextual information but connected speech is a continuous signal which poses co-articulation and temporal smearing. Furthermore, a separate model has been trained for each source-target language pair rising a requirement of N (N − 1) mapping models.…”

Section: Mapping Modelsmentioning

confidence: 99%

“…As this work extends the previous work, experiments here are done on same data set as used in [24]. Full Language Packs (FLP) of four low-resource languages from IARPA BABEL speech corpus [26] (Tamil (tam), Telugu (tel), Cebuano (ceb) and Javanese (jav)) are used for baseline ASR training and evaluation.…”

Section: Experimental Setup 41 Data Setmentioning

confidence: 99%

“…Recently, we have proposed a technique to learn crosslingual acoustic-phonetic similarities on phoneme level [23] which has been used for multilingual and cross-lingual acoustic model fusion [24]. A model is trained to learn mappings from a source language ASR output posterior distributions to that of the target language ASR.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations