Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-10182
|View full text |Cite
|
Sign up to set email alerts
|

Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals

Abstract: This paper proposes neural models to predict Speech Intelligibility (SI),both by prediction of established SI metrics and of human speech recognition (HSR) on the 1st Clarity Prediction Challenge. Both intrusive and non-intrusive predictors for intrusive SI metrics are trained, then fine-tuned on the HSR ground truth. Results are reported on a number of SI metrics, and the model choice for the Clarity challenge submission is explained. Additionally, the relationship between the SI scores in the data and common… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 44 publications
(60 reference statements)
0
4
0
Order By: Relevance
“…Recent work in speech enhancement [6,21,22] have found that the outputs of HuBERT's encoder stage H FE (•) are particularly useful for capturing quality-related information, outperforming the final transformer layer and weighted sums of each transformer output. The outputs of H FE (•) are 2D representations with dimensions 512 × T where T depends on the length of the input audio in seconds.…”
Section: Hubert Encoder Feature Representationsmentioning
confidence: 99%
“…Recent work in speech enhancement [6,21,22] have found that the outputs of HuBERT's encoder stage H FE (•) are particularly useful for capturing quality-related information, outperforming the final transformer layer and weighted sums of each transformer output. The outputs of H FE (•) are 2D representations with dimensions 512 × T where T depends on the length of the input audio in seconds.…”
Section: Hubert Encoder Feature Representationsmentioning
confidence: 99%
“…The best performing non-intrusive approach [12] uses an uncertainty measure derived from state-of-theart ASR systems as a proxy for human intelligibility, finding a strong correlation between the two measures. Other successful approaches [13,14] make use of powerful feature representations derived from self-supervised speech representations (SSSRs) as inputs to neural speech intelligibility prediction models, while others use neural network structures which have been shown to be useful in the related task of human speech quality rating prediction [15]. CPC2 differs from CPC1 in that its evaluation sets are disjoint in terms of listener and hearing aid system relative to its training sets.…”
Section: Prior Approachesmentioning
confidence: 99%
“…A model structure following work on the CPC1 in [14] is chosen for the primary SI prediction network (cf. Section 3.2.1), depicted to the right in Figure 2.…”
Section: Model Structurementioning
confidence: 99%
See 1 more Smart Citation