Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1119
|View full text |Cite
|
Sign up to set email alerts
|

Microscopic Multilingual Matrix Test Predictions Using an ASR-Based Speech Recognition Model

Abstract: In an attempt to predict the outcomes of matrix sentence tests in different languages and various noise conditions for native listeners, the simulation framework for auditory discrimination experiments (FADE) and the extended Speech Intelligibility Index (eSII) is employed. FADE uses an automatic speech recognition system to simulate recognition experiments and reports the highest achievable performance as the outcome, which showed good predictions for the German matrix test in noise. The eSII is based on the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
23
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 11 publications
(25 citation statements)
references
References 21 publications
2
23
0
Order By: Relevance
“…Predictions with FADE were found to be close to the performance of listeners with normal hearing across languages and in important noise conditions ( Schädler, Hülsmeier, et al., 2016 ). The performance of listeners with impaired hearing can be assumed to be decreased.…”
Section: Methodsmentioning
confidence: 71%
See 1 more Smart Citation
“…Predictions with FADE were found to be close to the performance of listeners with normal hearing across languages and in important noise conditions ( Schädler, Hülsmeier, et al., 2016 ). The performance of listeners with impaired hearing can be assumed to be decreased.…”
Section: Methodsmentioning
confidence: 71%
“…Building and evaluating another model was out of the scope of this contribution. The ASR-based modeling approach FADE has been compared with other models using less complex observed data sets for which compatible models existed in speech recognition as well as in basic psychoacoustic tasks ( Kollmeier et al., 2016 ; Schädler, Hülsmeier, et al., 2016 ; Schädler, Warzybok, et al., 2016 ; Schädler et al., 2015 , 2018 ). To facilitate the comparison with future models on the same or other data sets, the anonymized observed data, the MHA configurations, the source code of the measurement procedures, the source code of the modeling framework, including the modified feature extraction as well as the evaluation scripts, are available online.…”
Section: Discussionmentioning
confidence: 99%
“…In accordance with this observation, Scha¨dler, Hu¨lsmeier, Warzybok, Hochmuth, and Kollmeier (2016a) recently found that independent frequency bands (as assumed, e.g., with the DTW þ PEMO approach) for speech recognition in fluctuating noise masker seem to be an untenable assumption. In the domain of ASR, and much in contrast to many models of auditory signal processing, integration across frequency bands is regarded an essential feature to solve the task of speech recognition in noise, and particularly, in fluctuating noise conditions.…”
Section: Introductionmentioning
confidence: 68%
“…The nonintrusive classification of noisy signals with FADE puts a natural lower bound on the recognition performance that strongly depends on the signal representation and was found to be in good agreement with empirical measurements of listeners with normal hearing ( Schädler et al., 2015 ). The approach was shown to predict the outcome of the matrix sentence test in different languages in stationary and fluctuating noise conditions ( Schädler et al., 2016a ). The auditory signal representation, which was originally designed for robust automatic speech recognition, was extended by common signal processing deficiencies to model impaired hearing ( Kollmeier, Schädler, Warzybok, Meyer, & Brand, 2015 , 2016 ).…”
Section: Introductionmentioning
confidence: 99%
“…Classical modelling approaches, like the AI and the SII, have been adapted to account for hearing impairment, but rely solely on the information provided by the audiogram and have thus only limited applicability (Pavlovic et al, 1986;Payton and Uchanski, 1994;Rhebergen et al, 2010;Meyer and Brand, 2013). On the other hand, sophisticated automatic speech recognition (ASR) based approaches, like the Framework for Auditory Discrimination Experiments (FADE, Sch€ adler et al, 2015;Sch€ adler et al, 2016), while powerful predictors of NH speech intelligibility, offer only limited insights into the involved auditory processes since the cue extraction from the internal representations of the signals is delegated to a highly trained ASR whose performance relies on the amount and type of (over-)training and less on the actual importance of the selected features for human listeners. Furthermore, such models require explicit individualized fitting of parameters in order to account for HI data (Kollmeier et al, 2016).…”
Section: Introductionmentioning
confidence: 99%