2011
DOI: 10.1121/1.3514525
|View full text |Cite
|
Sign up to set email alerts
|

Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes

Abstract: The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
17
0

Year Published

2011
2011
2019
2019

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 36 publications
(18 citation statements)
references
References 35 publications
(38 reference statements)
1
17
0
Order By: Relevance
“…Despite substantial progress in the field of computational auditory scene analysis (CASA) over the past decades, machine-based approaches that attempt to replicate human speech recognition abilities are still far away from being as robust as humans against the detrimental influence of competing sources and interfering noise. Even when considering a very restricted task, e.g., consonant or phoneme recognition, there is still a tremendous difference of $10-15 dB in performance when comparing machine-based recognition with the scores obtained by human listeners (e.g., Sroka and Braida, 2005;Meyer et al, 2011).…”
Section: Introductionmentioning
confidence: 99%
“…Despite substantial progress in the field of computational auditory scene analysis (CASA) over the past decades, machine-based approaches that attempt to replicate human speech recognition abilities are still far away from being as robust as humans against the detrimental influence of competing sources and interfering noise. Even when considering a very restricted task, e.g., consonant or phoneme recognition, there is still a tremendous difference of $10-15 dB in performance when comparing machine-based recognition with the scores obtained by human listeners (e.g., Sroka and Braida, 2005;Meyer et al, 2011).…”
Section: Introductionmentioning
confidence: 99%
“…Several of these methods are inspired by the principles of human speech perception, which is motivated by the fact that the robustness of human recognition performance exceeds by far the robustness of ASR performance even in acoustically optimal conditions (Lippmann, 1997;Cooke and Scharenborg, 2008;Meyer et al, 2011b). The sources of variability in spoken language can be categorized into extrinsic sources (e.g., background noise, the room acoustics, or distortions of the communication channel) and intrinsic sources, which are associated with the speech signal itself (e.g., the talkers' speaking style, gender, age, mood, etc.).…”
Section: Introductionmentioning
confidence: 99%
“…The individuals with normal hearing include words recognition scores in quiet ≥ 90%, and they have the average obtain 50% performance at a signal-to-noise ratio of 2 to 6 dB, Brandy [21]. Also, for adults with normal hearing, the mean of word recognition scores in white noise at 10 dB signal to noise ratio is equal to 67.7%, Meyer et al [27]. In our study, the total mean of scores at 10 dB signal-to-noise ratio for all subjects was in the normal range.…”
Section: Discussionmentioning
confidence: 99%
“…The upper limit of scores was acquired for the healthy persons, whereas the lower limit of scores was obtained for affected ears. One reason for low word recognition scores in white noise may be poor auditory processing, Meyer et al [27] and Yilmaz et al [26]. Thus, low-frequency-high-intensity function of vestibular hearing may be useful in auditory processing.…”
Section: Discussionmentioning
confidence: 99%