Interspeech 2006 2006
DOI: 10.21437/interspeech.2006-28
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual non-native speech recognition using phonetic confusion-based acoustic model modification and graphemic constraints

Abstract: In this paper we present an automated approach for non-native speech recognition. We introduce a new phonetic confusion concept that associates sequences of native language (NL) phones to spoken language (SL) phones. Phonetic confusion rules are automatically extracted from a non-native speech database for a given NL and SL using both NL's and SL's ASR systems. These rules are used to modify the acoustic models (HMMs) of SL's ASR by adding acoustic models of NL's phones according to these rules. As pronunciati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2007
2007
2020
2020

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(15 citation statements)
references
References 3 publications
0
15
0
Order By: Relevance
“…Possible explanations might be sought in the nature of the variation that characterizes non-native speech. Non native speakers are likely to replace target language phonemes by phonemes from their mother tongue [3,5]. When the non-native speech is heterogeneous in the sense that it is produced by speakers with different mother tongues, as in our case, it may be extremely difficult to capture the rather diffuse pattern of variation by including variants in the lexicon (see also [4]).…”
Section: Discussion the Results Presented In The Previous Sectionmentioning
confidence: 94%
See 1 more Smart Citation
“…Possible explanations might be sought in the nature of the variation that characterizes non-native speech. Non native speakers are likely to replace target language phonemes by phonemes from their mother tongue [3,5]. When the non-native speech is heterogeneous in the sense that it is produced by speakers with different mother tongues, as in our case, it may be extremely difficult to capture the rather diffuse pattern of variation by including variants in the lexicon (see also [4]).…”
Section: Discussion the Results Presented In The Previous Sectionmentioning
confidence: 94%
“…First of all, because non-native speech is atypical in many respects and, as such, it poses serious problems to ASR systems [1][2][3][4]. Non-native speech may deviate from native speech with respect to pronunciation, morphology, syntax, and the lexicon.…”
Section: Introductionmentioning
confidence: 99%
“…Refer to (Matassoni et al, 2018) for comparisons with a different non-native children speech data set and to scientific literature (Wilpon and Jacobsen, 1996;Das et al, 1998;Li and Russell, 2001;Giuliani and Gerosa, 2003;Potamianos and Narayanan, 2003;Gerosa et al, 2007;Gerosa et al, 2009;Liao et al, 2015;Serizel and Giuliani, 2016) for detailed descriptions of children speech recognition and related issues. Important, although not exhaustive of the topic, references on non-native speech recognition can be found in Oh et al, 2006;Strik et al, 2009;Steidl et al, 2004;Bouselmi et al, 2006;Duan et al, 2017;Li et al, 2016;Lee and Glass, 2015;Das and Hasegawa-Johnson, 2015). As for language models, accurate transcriptions of spoken responses demand for models able to cope with not wellformed expressions (due to students' grammatical errors).…”
Section: Asr-related Challengesmentioning
confidence: 99%
“…Traditional ASR systems would be inefficient in such case as their performance drops drastically when confronted with non-native speech. This performance drop is a well known problem (see [1]).…”
Section: Introductionmentioning
confidence: 99%
“…Recent research works for non-native speech have already allowed a significant improvement in that filed. The approaches described in [1], [3] and [4] allowed significative performance enhancement against non-native speech. Nevertheless, those approaches require the knowledge of the origin of the speakers uttering the speech they are applied to.…”
Section: Introductionmentioning
confidence: 99%