Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals

Close, George; Hollands, Samuel; Goetze, Stefan; Hain, Thomas

doi:10.21437/interspeech.2022-10182

Cited by 3 publications

(4 citation statements)

References 44 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent work in speech enhancement [6,21,22] have found that the outputs of HuBERT's encoder stage H FE (•) are particularly useful for capturing quality-related information, outperforming the final transformer layer and weighted sums of each transformer output. The outputs of H FE (•) are 2D representations with dimensions 512 × T where T depends on the length of the input audio in seconds.…”

Section: Hubert Encoder Feature Representationsmentioning

confidence: 99%

Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement

Close,

Ravenscroft,

Hain

et al. 2024

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Neural network based approaches to speech enhancement have shown to be particularly powerful, being able to leverage a data-driven approach to result in a significant performance gain versus other approaches. Such approaches are reliant on artificially created labelled training data such that the neural model can be trained using intrusive loss functions which compare the output of the model with clean reference speech. Performance of such systems when enhancing real-world audio often suffers relative to their performance on simulated test data. In this work, a non-intrusive multi-metric prediction approach is introduced, wherein a model trained on artificial labelled data using inference of an adversarially trained metric prediction neural network. The proposed approach shows improved performance versus state-of-the-art systems on the recent CHiME-7 challenge unsupervised domain adaptation speech enhancement (UDASE) task evaluation sets.

show abstract

Section: Hubert Encoder Feature Representationsmentioning

confidence: 99%

Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement

Close,

Ravenscroft,

Hain

et al. 2024

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…The best performing non-intrusive approach [12] uses an uncertainty measure derived from state-of-theart ASR systems as a proxy for human intelligibility, finding a strong correlation between the two measures. Other successful approaches [13,14] make use of powerful feature representations derived from self-supervised speech representations (SSSRs) as inputs to neural speech intelligibility prediction models, while others use neural network structures which have been shown to be useful in the related task of human speech quality rating prediction [15]. CPC2 differs from CPC1 in that its evaluation sets are disjoint in terms of listener and hearing aid system relative to its training sets.…”

Section: Prior Approachesmentioning

confidence: 99%

“…A model structure following work on the CPC1 in [14] is chosen for the primary SI prediction network (cf. Section 3.2.1), depicted to the right in Figure 2.…”

Section: Model Structurementioning

confidence: 99%

See 1 more Smart Citation

Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models

Mogridge,

Close,

Sutherland

et al. 2024

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

The Effect of Spoken Language on Speech Enhancement Using Self-Supervised Speech Representation Loss Functions

Close,

Hain,

Goetze

2023

2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

View full text Add to dashboard Cite

Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals

Cited by 3 publications

References 44 publications

Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement

Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement

Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models

The Effect of Spoken Language on Speech Enhancement Using Self-Supervised Speech Representation Loss Functions

Contact Info

Product

Resources

About