Single channel speech enhancement using convolutional neural network

Kounovsky, Tomas; Málek, Jiřı́

doi:10.1109/ecmsm.2017.7945915

Cited by 53 publications

(30 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, a similar encoder-decoder architecture is developed in [21]. Other studies [9], [38], [24], [1], [14], [15] using CNN for mask estimation or spectral mapping also achieve small performance improvements over a DNN. Recently, Fu et al .…”

Section: Introductionmentioning

confidence: 99%

Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

Tan

Chen

Wang

2019

IEEE/ACM Trans. Audio Speech Lang. Process.

158

View full text Add to dashboard Cite

For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture for monaural speech enhancement. The key idea is to systematically aggregate contexts through dilated convolutions, which significantly expand receptive fields. The CNN model additionally incorporates gating mechanisms and residual learning. Our experimental results suggest that the proposed model generalizes well to untrained noises and untrained speakers. It consistently outperforms a DNN, a unidirectional long short-term memory (LSTM) model and a bidirectional LSTM model in terms of objective speech intelligibility and quality metrics. Moreover, the proposed model has far fewer parameters than DNN and LSTM models.

show abstract

Section: Introductionmentioning

confidence: 99%

Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

Tan

Chen

Wang

2019

IEEE/ACM Trans. Audio Speech Lang. Process.

158

View full text Add to dashboard Cite

show abstract

“…Recent applications confirm that CNNs show a good modeling ability for acoustic problems and can outperform state-of-the-art algorithms in this context. Such applications include speech dereverberation [40]- [42], speech enhancement [43]- [45].…”

Section: Convolutional Neural Networkmentioning

confidence: 99%

Room Acoustical Parameter Estimation From Room Impulse Responses Using Deep Neural Networks

Kleijn

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

“…method is compared to noisy speech, conventional single channel SE based on Log-MMSE [9] and dual-microphone method like spectral coherence [45]. SE methods based on DNN [19], single channel CNN based denoising auto encoder (CNN-DAE) [26], and Multi-Objective Learning-based DNN SE [27] methods are implemented and included for comparison. The deep learning-based SE methods were trained on the same datasets as that of proposed method.…”

Section: B Offline Objective Evaluationmentioning

confidence: 99%

“…Researchers have also considered CNN-based end-to-end approach to SE that requires just the raw audio data [25]. Linguistic training and testing of SE based on CNN [26] concluded that the performance of monolingual trained models was on par with multilingual models which makes it better than DNNs. Several features like LPS, Mel Frequency cepstral Coefficients (MFCC), Gammatone Frequency cepstral coefficients (GFCCs) and IBM were employed in multiobjective learning for SE with a DNN architecture to improve the performance in terms of quality and intelligibility of speech [27], [28].…”

Section: Introductionmentioning

confidence: 99%

A Real-Time Convolutional Neural Network Based Speech Enhancement for Hearing Impaired Listeners Using Smartphone

et al. 2019

View full text Add to dashboard Cite

This paper presents a Speech Enhancement (SE) technique based on multi-objective learning convolutional neural network to improve the overall quality of speech perceived by Hearing Aid (HA) users. The proposed method is implemented on a smartphone as an application that performs real-time SE. This arrangement works as an assistive tool to HA. A multi-objective learning architecture including primary and secondary features uses a mapping-based convolutional neural network (CNN) model to remove noise from a noisy speech spectrum. The algorithm is computationally fast and has a low processing delay which enables it to operate seamlessly on a smartphone. The steps and the detailed analysis of real-time implementation are discussed. The proposed method is compared with existing conventional and neural network-based SE techniques through speech quality and intelligibility metrics in various noisy speech conditions. The key contribution of this paper includes the realization of CNN SE model on a smartphone processor that works seamlessly with HA. The experimental results demonstrate significant improvements over the state-of-the-art techniques and reflect the usability of the developed SE application in noisy environments. INDEX TERMS Convolutional neural network (CNN), speech enhancement (SE), hearing aid (HA), smartphone, real-time implementation, log power spectra (LPS).

show abstract

Single channel speech enhancement using convolutional neural network

Cited by 53 publications

References 15 publications

Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

Room Acoustical Parameter Estimation From Room Impulse Responses Using Deep Neural Networks

A Real-Time Convolutional Neural Network Based Speech Enhancement for Hearing Impaired Listeners Using Smartphone

Contact Info

Product

Resources

About