Use of a Deep Recurrent Neural Network to Reduce Wind Noise: Effects on Judged Speech Intelligibility and Sound Quality

Keshavarzi, Mahmoud; Goehring, Tobias; Zakis, Justin A.; Turner, Richard E.; Moore, Brian C. J.

doi:10.1177/2331216518770964

Cited by 21 publications

(15 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The LSTM processed a 5-timestep input where each timestep was related to acoustic features extracted from a single frame of the input signal (noisy speech); steps 1, 2, 3, 4, and 5 corresponded to successive frames j -4, j -3, j -2, j -1, and j , respectively. We selected this architecture based on previous studies using HI listeners (Keshavarzi et al ., 2018; 2019). The RNN estimated the IRM for frame j as its output (estimated ratio mask, ERM).…”

Section: Algorithm Descriptionmentioning

confidence: 99%

“…RNN-LSTM algorithms have shown improved generalization using objective measures, but have not been evaluated in listening studies with CI users. However, similar types of LSTM-RNNs have recently been shown to provide benefits for speech-in-noise perception for HI listeners (Bramslow et al ., 2018; Keshavarzi et al ., 2018; 2019; Healy et al ., 2019), and they represent a promising way for improving performance for CI users in conditions with non-stationary noise that was not included in the training data.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants

Goehring

Keshavarzi

Carlyon

et al. 2019

The Journal of the Acoustical Society of America

Self Cite

View full text Add to dashboard Cite

Speech-in-noise perception is a major problem for users of cochlear implants (CIs), especially with non-stationary background noise. Noise-reduction algorithms have produced benefits but relied on a priori information about the target speaker and/or background noise. We developed a recurrent neural network (RNN) algorithm for enhancing speech in non-stationary noise and evaluated its benefits for speech perception, using both objective measures and experiments with CI simulations and CI users. The RNN was trained using speech from many talkers mixed with multi-talker or traffic noise recordings. Its performance was evaluated using speech from a novel talker mixed with novel noise recordings of the same class, either babble or traffic noise. Objective measures indicated benefits of using a recurrent over a feed-forward architecture and predicted better speech intelligibility with than without the processing. The experimental results showed significantly improved intelligibility of speech in babble noise but not in traffic noise. CI subjects rated the processed stimuli as significantly better in terms of speech distortions, noise intrusiveness and overall quality than unprocessed stimuli for both babble and traffic noise. These results extend previous findings for CI users to mostly unseen acoustic conditions with non-stationary noise.

show abstract

Section: Algorithm Descriptionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants

Goehring

Keshavarzi

Carlyon

et al. 2019

The Journal of the Acoustical Society of America

Self Cite

View full text Add to dashboard Cite

show abstract

“…The LSTM processed a five-timestep input wherein each timestep was related to acoustic features extracted from a single frame of the input signal (noisy speech); steps 1, 2, 3, 4, and 5 corresponded to successive frames j-4, j-3, j-2, j-1, and j, respectively. We selected this architecture based on previous studies using HI listeners (Keshavarzi et al, 2018;Keshavarzi et al, 2019). The RNN estimated the IRM for frame j as its output (estimated ratio mask, ERM).…”

Section: A Signal Processing and Rnn Architecturementioning

confidence: 99%

“…RNN-LSTM algorithms have shown improved generalization using objective measures, but have not been evaluated in listening studies with CI users. However, similar types of LSTM-RNNs have recently been shown to provide benefits for speech-in-noise perception for HI listeners (Bramsløw et al, 2018;Keshavarzi et al, 2018;Keshavarzi et al, 2019;Healy et al, 2019), and they represent a promising way for improving performance for CI users in conditions with non-stationary noise that was not included in the training data.…”

Section: Introductionmentioning

confidence: 99%

Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants

Goehring¹,

Keshavarzi²,

Carlyon³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Speech-in-noise perception is a major problem for users of cochlear implants (CIs), especially with non-stationary background noise such as competing talkers or traffic. Algorithms that facilitate speech perception by attenuating background noise have produced benefits but relied on a priori information about the target speaker and/or background noise. We developed a recurrent neural network (RNN) algorithm for enhancing speech in non-stationary noise and evaluated its benefits for speech perception, using objective measures, experiments with normal-hearing (NH) subjects listening to CI simulations and experiments with CI users. The RNN was trained using a data set that included speech from many talkers mixed with a set of real-world multi-talker or traffic noise recordings. Its performance was evaluated using speech from a novel talker mixed with novel noise recordings of the same class, either babble or traffic noise. The signal-to-noise ratios also differed from those used for training. Objective measures indicated benefits of using a recurrent architecture over a simpler feed-forward architecture and predicted better speech intelligibility than for the unprocessed speech in noise. The experimental results showed significant improvements in speech intelligibility for the speech in babble for both CI simulations and CI subjects; speech reception thresholds were improved by 1.4 to 3.4 dB. There was no significant improvement for the traffic noise. CI subjects rated stimuli processed using the RNN algorithm as significantly better in terms of speech distortions, noise intrusiveness and overall quality than unprocessed stimuli for both babble and traffic noise, with larger improvements for the former. These results extend previous findings and indicate benefits in speech-in-noise performance by CI listeners for mostly unseen acoustic conditions when using a speaker-independent algorithm that was optimized for non-stationary noises.

show abstract

“…Statistical methods can predict the concentration of air pollutants including PM 2.5 by analyzing air quality related data and have received extensive attention from scholars. [15] models, autoregressive moving average (ARIMA) [16] models, land use regression (LUR) [17] models, generalized additive 4 models (GAM) [18,19], support vector regression (SVR) models [20], artificial neural network (ANN) models [21] in machine learning, recurrent neural network (RNN) [22,23], convolutional neural networks (CNN) [24,25] and long-term memory neural network (LSTM) [26,27] in deep learning [28].…”

Section: Introductionmentioning

confidence: 99%

PM2.5 concentration forecasting using Long Short-Term Memory Neural Network and Multi-Level Additive Model

Tao¹,

Tian²,

Wu³

et al. 2019

Preprint

View full text Add to dashboard Cite

Background PM 2.5 concentration predication can provide an effective way to protect public health by early warning. Though there are many methods available, the comparison between multi-level additive model (AM) and long short-term memory (LSTM) neural network in predicting PM 2.5 concentration is limited. This study aimed to compare the performance of multi-level AM and LSTM in predicting hourly and daily PM 2.5 concentration.Methods Air pollution data from Jul 1, 2016 to Dec 31, 2017 were obtained from Beijing Municipal Environmental Monitoring Center, and meteorological data were derived from the National Meteorological Science Data Sharing Service. Multi-level AM and LSTM were developed to estimate the regional hourly and daily concentration of PM 2.5 .Results In the prediction of hourly PM 2.5 concentrations, LSTM achieved a better performance than multi-level AM (range of R 2 : 0.76-0.92 for LSTM, 0.59-0.78 for multi-level AM; range of root mean square error (RMSE): 6.20-17.58μg/m 3 for LSTM, 19.19-30.81μg/m 3 for multi-level AM; range of mean absolute error (MAE): 4.50-13.42μg/m 3 for LSTM, 13.55-22.35μg/m 3 for multi-level AM; range of mean absolute percentage error (MAPE): 0.18%-0.55% for LSTM, 0.50%-0.87% for multi-level AM). While in the prediction of daily PM 2.5 concentrations, multi-level AM showed a higher predictive accuracy than LSTM (range of R 2 : 0.43-0.93 for LSTM, 0.74-0.98 for multi-level AM; range of RMSE: 32.46-46.82μg/m 3 for LSTM, 4.83-20.98μg/m 3 for multi-level AM; range of MAE: 24.32-34.89μg/m 3 for LSTM, 3.67-16.33μg/m 3 for multi-level AM; range of MAPE: 0.92%-1.74% for LSTM, 0.11%-0.45% for multi-level AM).Conclusion LSTM showed better performance than the multi-level AM when there is a large amount of data, while multi-level AM showed better performance than LSTM when the amount of data is relatively small.

show abstract

Use of a Deep Recurrent Neural Network to Reduce Wind Noise: Effects on Judged Speech Intelligibility and Sound Quality

Cited by 21 publications

References 37 publications

Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants

Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants

Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants

PM2.5 concentration forecasting using Long Short-Term Memory Neural Network and Multi-Level Additive Model

Contact Info

Product

Resources

About