Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1878
|View full text |Cite
|
Sign up to set email alerts
|

N-MTTL SI Model: Non-Intrusive Multi-Task Transfer Learning-Based Speech Intelligibility Prediction Model with Scenery Classification

Abstract: The application of speech enhancement algorithms for hearing aids may not always be beneficial to increasing speech intelligibility. Therefore, a prior environment classification could be important. However, previous speech intelligibility models do not provide any additional information regarding the reason for a decrease in speech intelligibility. We propose a unique non-intrusive multi-task transfer learning-based speech intelligibility prediction model with scenery classification (N-MTTL SI model). The sol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…Our work also shares connections with the literature on intelligibility prediction based on DNN representations [27,7,28,6,32]. More relevantly, in [6], SSL representations were used and optimized to predict multiple speech intelligibility indices.…”
Section: Related Workmentioning
confidence: 59%
See 2 more Smart Citations
“…Our work also shares connections with the literature on intelligibility prediction based on DNN representations [27,7,28,6,32]. More relevantly, in [6], SSL representations were used and optimized to predict multiple speech intelligibility indices.…”
Section: Related Workmentioning
confidence: 59%
“…The practical implications for intelligibility prediction research are evident from the study. Firstly, the results suggest that SSL representations should be chosen over supervisedlearned ones, contrary to what has been done in [7,27] for instance. Secondly, cat(z, z ref ) consistently and significantly outperforming sim(z, z ref ) as a feature for intelligibility prediction, indicates that learned non-linear functions over raw features should be preferred over linear similarity measures.…”
Section: On the Meaning Of Our Resultsmentioning
confidence: 79%
See 1 more Smart Citation
“…Other NR tools produce estimates of objective values including FR speech quality values [23], [30], [32], [38], [44], [51], [54], [56], [57], FR speech intelligibility values [30], [32], [38], [44], [52], [54], [56], [57], speech transmission index [22], codec bit-rate [46], and detection of specific impairments, artifacts, or noise types [34], [39], [41], [52]. Some of these tools perform a single task and others perform multiple tasks.…”
Section: A Existing Machine Learning Approachesmentioning
confidence: 99%
“…The non-intrusive speech quality assessment model called NISQA [50] produces estimates of subjective speech quality as well as four constituent dimensions: noisiness, coloration, discontinuity, and loudness. Other NR tools produce estimates of objective values including FR speech quality values [22], [29], [31], [42], [48], FR speech intelligibility values [29], [31], [42], [49], speech transmission index [21], codec bit-rate [43], and detection of specific impairments, artifacts, or noise types [33], [37], [39], [49]. Some of these tools perform a single task and others perform multiple tasks.…”
Section: A Existing Machine Learning Approachesmentioning
confidence: 99%