Abstract:Multilingual speech recognition has drawn significant attention as an effective way to compensate data scarcity for lowresource languages. End-to-end (e2e) modelling is preferred over conventional hybrid systems, mainly because of no lexicon requirement. However, hybrid DNN-HMMs still outperform e2e models in limited data scenarios. Furthermore, the problem of manual lexicon creation has been alleviated by publicly available trained models of grapheme-to-phoneme (G2P) and text to IPA transliteration for a lot … Show more
“…So, the mapping models accuracy is calculated for different values of n where n represents the number of most probable classes. Though the accuracy increases with increasing value of n, rate of change is not as much as observed in case of phonemes by [24] which implies that the performance of mapping model in case of phoneme based hybrid DNN-HMM systems has been better than that for e2e systems. Since the mapping models are trained using posterior distributions of ASR outputs, one potential reason could be the detrimental affect of speech recognition systems on the training of mapping models.…”
Section: Mapping Modelsmentioning
confidence: 84%
“…Mapping models in the previous work [24] have been trained on frame level without considering the contextual information but connected speech is a continuous signal which poses co-articulation and temporal smearing. Furthermore, a separate model has been trained for each source-target language pair rising a requirement of N (N − 1) mapping models.…”
Section: Mapping Modelsmentioning
confidence: 99%
“…As this work extends the previous work, experiments here are done on same data set as used in [24]. Full Language Packs (FLP) of four low-resource languages from IARPA BABEL speech corpus [26] (Tamil (tam), Telugu (tel), Cebuano (ceb) and Javanese (jav)) are used for baseline ASR training and evaluation.…”
Section: Experimental Setup 41 Data Setmentioning
confidence: 99%
“…Recently, we have proposed a technique to learn crosslingual acoustic-phonetic similarities on phoneme level [23] which has been used for multilingual and cross-lingual acoustic model fusion [24]. A model is trained to learn mappings from a source language ASR output posterior distributions to that of the target language ASR.…”
“…So, the mapping models accuracy is calculated for different values of n where n represents the number of most probable classes. Though the accuracy increases with increasing value of n, rate of change is not as much as observed in case of phonemes by [24] which implies that the performance of mapping model in case of phoneme based hybrid DNN-HMM systems has been better than that for e2e systems. Since the mapping models are trained using posterior distributions of ASR outputs, one potential reason could be the detrimental affect of speech recognition systems on the training of mapping models.…”
Section: Mapping Modelsmentioning
confidence: 84%
“…Mapping models in the previous work [24] have been trained on frame level without considering the contextual information but connected speech is a continuous signal which poses co-articulation and temporal smearing. Furthermore, a separate model has been trained for each source-target language pair rising a requirement of N (N − 1) mapping models.…”
Section: Mapping Modelsmentioning
confidence: 99%
“…As this work extends the previous work, experiments here are done on same data set as used in [24]. Full Language Packs (FLP) of four low-resource languages from IARPA BABEL speech corpus [26] (Tamil (tam), Telugu (tel), Cebuano (ceb) and Javanese (jav)) are used for baseline ASR training and evaluation.…”
Section: Experimental Setup 41 Data Setmentioning
confidence: 99%
“…Recently, we have proposed a technique to learn crosslingual acoustic-phonetic similarities on phoneme level [23] which has been used for multilingual and cross-lingual acoustic model fusion [24]. A model is trained to learn mappings from a source language ASR output posterior distributions to that of the target language ASR.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.