Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks

Andersen, Asger Heidemann; Haan, Jan Mark de; Jensen, Jesper

doi:10.1109/taslp.2018.2847459

Cited by 44 publications

(27 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Kendall's (τ) values obtained by the N-MTTL SI model (ResNet-18 MT) are 0.80 and 0.69 for seen and unseen conditions, respectively. Overall, the accuracy measures of the model are comparable to the literature [17], where similar values are observed for unseen conditions. However, conditions in our dataset are not identical to [17], therefore direct comparison is not possible.…”

Section: Speech Intelligibility Predictionsupporting

confidence: 78%

“…The residual block also contains a batch normalization layer and Rectified Linear Unit (ReLU) activation function, which is used after every convolutional layer [26]. Previous work [17] has explored the importance of the convolution layer to extract spectro-temporal patterns in the input signal related to speech intelligibility. Therefore, we expect the convolutional layers of ResNet to be beneficial for both our specific tasks in our N-MTTL SI model.…”

Section: Resnet (Residual Network)mentioning

confidence: 99%

“…Spearman rank correlation (ρ) confirms monotonicity between the estimated and labelled intelligibility, which is 0.93/0.85 for the seen/unseen conditions, respectively. Lastly, Kendall's rank correlation coefficient, as used in the speech intelligibility prediction literature [17,18], also expresses the degree of monotonicity in the relation between measurements and predictions. Kendall's (τ) values obtained by the N-MTTL SI model (ResNet-18 MT) are 0.80 and 0.69 for seen and unseen conditions, respectively.…”

Section: Speech Intelligibility Predictionmentioning

confidence: 99%

“…Recently, rapid progress in the deep learning domain has also been utilised in the field of speech intelligibility prediction. Several models have been introduced employing DNN [16], CNN [17,18] or U-Net [19]. However, none of the speech intelligibility models proposed so far provide any additional information regarding the cause of the speech intelligibility degradation that could be further used to fine-tune speech enhancement algorithms.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

N-MTTL SI Model: Non-Intrusive Multi-Task Transfer Learning-Based Speech Intelligibility Prediction Model with Scenery Classification

Marcinek¹,

Stone²,

Millman³

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

The application of speech enhancement algorithms for hearing aids may not always be beneficial to increasing speech intelligibility. Therefore, a prior environment classification could be important. However, previous speech intelligibility models do not provide any additional information regarding the reason for a decrease in speech intelligibility. We propose a unique non-intrusive multi-task transfer learning-based speech intelligibility prediction model with scenery classification (N-MTTL SI model). The solution combines a Mel-spectrogram analysis of the degraded speech signal with transfer learning and multi-task learning to provide simultaneous speech intelligibility prediction (task 1) and scenery classification of ten real-world noise conditions (task 2). The model utilises a pre-trained ResNet architecture as an encoder for feature extraction. The prediction accuracy of the N-MTTL SI model for both tasks is high. Specifically, RMSE of speech intelligibility predictions for seen and unseen conditions is 3.76% and 4.06%. The classification accuracy is 98%. In addition, the proposed solution demonstrates the potential of using pre-trained deep learning models in the domain of speech intelligibility prediction.

show abstract

Section: Speech Intelligibility Predictionsupporting

confidence: 78%

Section: Resnet (Residual Network)mentioning

confidence: 99%

Section: Speech Intelligibility Predictionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

N-MTTL SI Model: Non-Intrusive Multi-Task Transfer Learning-Based Speech Intelligibility Prediction Model with Scenery Classification

Marcinek¹,

Stone²,

Millman³

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

show abstract

“…However, since there is no reference signal for the receiver such as communication, a non-intrusive intelligibility estimation method is required. In [3], the speech intelligibility is predicted using convolutional neural network which is trained with measured intelligibility scores that humans listen and evaluate. The work in [4] presented the method of speech intelligibility prediction by using automatic speech recognition (ASR) system based deep neural networks.…”

Section: Introductionmentioning

confidence: 99%

A Non-Intrusive Speech Intelligibility Estimation Method Based on Deep Learning Using Autoencoder Features

Kim

Yun

Lee

et al. 2020

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

This paper presents a deep learning-based non-intrusive speech intelligibility estimation method using bottleneck features of autoencoder. The conventional standard non-intrusive speech intelligibility estimation method, P.563, lacks intelligibility estimation performance in various noise environments. We propose a more accurate speech intelligibility estimation method based on long-short term memory (LSTM) neural network whose input and output are an autoencoder bottleneck features and a short-time objective intelligence (STOI) score, respectively, where STOI is a standard tool for measuring intrusive speech intelligibility with reference speech signals. We showed that the proposed method has a superior performance by comparing with the conventional standard P.563 and mel-frequency cepstral coefficient (MFCC) feature-based intelligibility estimation methods for speech signals in various noise environments.

show abstract

Learning to Predict Speech Intelligibility from Speech Distortions

Kuriakose

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks

Cited by 44 publications

References 64 publications

N-MTTL SI Model: Non-Intrusive Multi-Task Transfer Learning-Based Speech Intelligibility Prediction Model with Scenery Classification

N-MTTL SI Model: Non-Intrusive Multi-Task Transfer Learning-Based Speech Intelligibility Prediction Model with Scenery Classification

A Non-Intrusive Speech Intelligibility Estimation Method Based on Deep Learning Using Autoencoder Features

Learning to Predict Speech Intelligibility from Speech Distortions

Contact Info

Product

Resources

About