2018
DOI: 10.1587/transinf.2017edl8225
|View full text |Cite
|
Sign up to set email alerts
|

A Deep Learning-Based Approach to Non-Intrusive Objective Speech Intelligibility Estimation

Abstract: This paper proposes a deep learning-based non-intrusive objective speech intelligibility estimation method based on recurrent neural network (RNN) with long short-term memory (LSTM) structure. Conventional non-intrusive estimation methods such as standard P.563 have poor estimation performance and lack of consistency, especially, in various noise and reverberation environments. The proposed method trains the LSTM RNN model parameters by utilizing the STOI that is the standard intrusive intelligibility estimati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 4 publications
(4 reference statements)
0
11
0
Order By: Relevance
“…We refrain from fundamental introduction to the concept here and refer the reader to other sources (for example Shanmuganathan (2016); Basheer and Hajmeer (2000)). To fit the ANN to the training data, we use the optimizer Adam (Yun et al, 2018) with standard parameters. As a loss function, the mean squared error and as activation functions, Rectified Linear Units (ReLu) are used.…”
Section: Artificial Neural Networkmentioning
confidence: 99%
“…We refrain from fundamental introduction to the concept here and refer the reader to other sources (for example Shanmuganathan (2016); Basheer and Hajmeer (2000)). To fit the ANN to the training data, we use the optimizer Adam (Yun et al, 2018) with standard parameters. As a loss function, the mean squared error and as activation functions, Rectified Linear Units (ReLu) are used.…”
Section: Artificial Neural Networkmentioning
confidence: 99%
“…Then, we used contrast‐limited adaptive histogram equalization (CLAHE) to enhance the contrast of the database. To optimize the network, we used differentiable soft‐dice loss and an ADAM optimizer with the following parameters: base learning rate = 2 × 10 −4 and number of epochs = 200. Weights of the network were initialized from a Gaussian distribution with a zero mean and a SD of 0.001.…”
Section: Methodsmentioning
confidence: 99%
“…The work in [4] presented the method of speech intelligibility prediction by using automatic speech recognition (ASR) system based deep neural networks. Recently, the non-intrusive speech intelligibility estimation method based on a recurrent neural network (RNN) with a mel-frequency cepstrum coefficient (MFCC) vector was proposed [5]. In this paper, we propose a novel method of estimating intelligibility score with higher performance than conventional methods in noise environments based on deep learning using autoencoder bottleneck feature and STOI values.…”
Section: Introductionmentioning
confidence: 99%