Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

Li, Ruwei; Sun, Xiaoyue; Liu, Yanan; Yang, Der‐Ching; Dong, Liang

doi:10.1186/s13634-019-0618-4

Cited by 6 publications

(2 citation statements)

References 22 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A similar method of DNN-based mask estimation was done in past studies [19][20][21][22][23]. A study in [19] modified the feature and proposed adaptive masks in the DNN-based mask estimation with four hidden layers and 1024 hidden nodes. As a result, average PESQ and STOI scores of 2.12 and 0.78, respectively, were obtained at -5dB SNR [19].…”

Section: Related Workmentioning

confidence: 99%

“…A study in [19] modified the feature and proposed adaptive masks in the DNN-based mask estimation with four hidden layers and 1024 hidden nodes. As a result, average PESQ and STOI scores of 2.12 and 0.78, respectively, were obtained at -5dB SNR [19]. Another study in [23] used the features fusion technique for the DNN input, while the phase-aware and magnitude mask were applied as the target mask.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A Hybrid Approach for Single Channel Speech Enhancement using Deep Neural Network and Harmonic Regeneration Noise Reduction

Jamal¹,

Fuad²,

Sha’abani³

2020

IJACSA

View full text Add to dashboard Cite

This paper presents a hybrid approach for single channel speech enhancement using deep neural network (DNN) and harmonic regeneration noise reduction (HRNR). The DNN was used as a supervised algorithm to predict new target mask such as constrained Wiener Filter (cWF) target mask from noisy mixture signal that was transformed into gammatone filter bank features. Meanwhile, HRNR algorithm was applied in the postfiltering strategy to eliminate residual noise. The DNN algorithm is an emerging supervised speech enhancement to overcome heavy nonstationary noise and low signal-to-noise ratio (SNR) issues. To validate the proposed algorithm with new target mask, 600 Malay utterances combining male and female speakers were used in a training session while 120 Malay utterances were used in a prediction session. The short time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores were calculated as the performance metrics. In this work, the proposed target mask outperformed other baseline target masks. Thus, PESQ and STOI scores for the hybrid speech enhancement algorithm is 1.17 and 0.79, respectively, at -5 dB babble noise SNR.

show abstract