“…In this section, we present the forward blind source separation (BSS) structure and we give its full formulation and optimal solutions in the time-domain. This structure is intensively used in acoustic noise cancellation [10,[16][17][18][19]. The two-channel forward BSS structure is presented in Figure 2.At the output of this structure, the…”
Recently, the acoustic noise reduction problem is treated by twochannel forward blind source separation (BSS) techniques combined with normalized least mean square algorithm (T-FNLMS). The TFNLMS algorithm shows good performances in two-channel convolutive dispersive mixture. In this paper, we propose new BSS structure based on the two-channel sparse normalized least mean square algorithm (TS-NLMS). The TS-NLMS algorithm is proposed exactly when the convolutive mixing system is characterized by sparse impulse responses. To confirm the good performance of this proposed algorithm, intensive experiments are done in acoustic noise reduction.
“…In this section, we present the forward blind source separation (BSS) structure and we give its full formulation and optimal solutions in the time-domain. This structure is intensively used in acoustic noise cancellation [10,[16][17][18][19]. The two-channel forward BSS structure is presented in Figure 2.At the output of this structure, the…”
Recently, the acoustic noise reduction problem is treated by twochannel forward blind source separation (BSS) techniques combined with normalized least mean square algorithm (T-FNLMS). The TFNLMS algorithm shows good performances in two-channel convolutive dispersive mixture. In this paper, we propose new BSS structure based on the two-channel sparse normalized least mean square algorithm (TS-NLMS). The TS-NLMS algorithm is proposed exactly when the convolutive mixing system is characterized by sparse impulse responses. To confirm the good performance of this proposed algorithm, intensive experiments are done in acoustic noise reduction.
“…Nowadays, speech enhancement has been widely used in the fields of speech analysis, speech recognition, speech communication, and so forth. The aim of speech enhancement is to recover and improve the speech quality and its intelligibility via different techniques and algorithms, like unsupervised methods including spectral subtraction [1,2], Wiener filtering [3], statistical model-based estimation [4,5], subband forward algorithm [6], subspace method [5,7], and so on. Generally these unsupervised methods are based on statistical signal processing and typically work in the frequency domain.…”
Recently, supervised learning methods have shown promising performance, especially deep neural network-based (DNN) methods, in the application of single-channel speech enhancement. Generally, those approaches extract the acoustic features directly from the noisy speech to train a magnitude-aware target. In this paper, we propose to extract the acoustic features not only from the noisy speech but also from the pre-estimated speech, noise and phase separately, then fuse them into a new complementary feature for the purpose of obtaining more discriminative acoustic representation. In addition, on the basis of learning a magnitude-aware target, we also utilize the fusion feature to learn a phase-aware target, thereby further improving the accuracy of the recovered speech. We conduct extensive experiments, including performance comparison with some typical existing methods, generalization ability evaluation on unseen noise, ablation study, and subjective test by human listener, to demonstrate the feasibility and effectiveness of the proposed method. Experimental results prove that the proposed method has the ability to improve the quality and intelligibility of the reconstructed speech.
“…The performances are evaluated by considering an IEEE corpus, the GRID audio-visual corpus, and different types of noises. The proposed approach significantly improves objective speech quality and intelligibility and outperforms the conventional STFT-NMF, DWPT-NMF, and DNN-IRM methods.Keywords: Dual-tree complex wavelet transform (DTCWT); discrete wavelet packet transform (DWPT); stationary wavelet transform (SWT); speech enhancement (SE) 2 of 18 estimation [5], sparseness and temporal gradient regularization method [6], Wiener filtering [7], subband forward algorithm [8], and subspace method [9]. These methods consist of two parts: Noise tracking and signal gain estimation.…”
mentioning
confidence: 99%
“…Keywords: Dual-tree complex wavelet transform (DTCWT); discrete wavelet packet transform (DWPT); stationary wavelet transform (SWT); speech enhancement (SE) 2 of 18 estimation [5], sparseness and temporal gradient regularization method [6], Wiener filtering [7], subband forward algorithm [8], and subspace method [9]. These methods consist of two parts: Noise tracking and signal gain estimation.…”
In this paper, we propose a novel speech enhancement method based on dual-tree complex wavelet transforms (DTCWT) and nonnegative matrix factorization (NMF) that exploits the subband smooth ratio mask (ssRM) through a joint learning process. The discrete wavelet packet transform (DWPT) suffers the absence of shift invariance, due to downsampling after the filtering process, resulting in a reconstructed signal with significant noise. The redundant stationary wavelet transform (SWT) can solve this shift invariance problem. In this respect, we use efficient DTCWT with a shift invariance property and limited redundancy and calculate the ratio masks (RMs) between the clean training speech and noisy speech (i.e., training noise mixed with clean speech). We also compute RMs between the noise and noisy speech and then learn both RMs with their corresponding clean training clean speech and noise. The auto-regressive moving average (ARMA) filtering process is applied before NMF in previously generated matrices for smooth decomposition. An ssRM is proposed to exploit the advantage of the joint use of the standard ratio mask (sRM) and square root ratio mask (srRM). In short, the DTCWT produces a set of subband signals employing the time-domain signal. Subsequently, the framing scheme is applied to each subband signal to form matrices and calculates the RMs before concatenation with the previously generated matrices. The ARMA filter is implemented in the nonnegative matrix, which is formed by considering the absolute value. Through ssRM, speech components are detected using NMF in each newly formed matrix. Finally, the enhanced speech signal is obtained via the inverse DTCWT (IDTCWT). The performances are evaluated by considering an IEEE corpus, the GRID audio-visual corpus, and different types of noises. The proposed approach significantly improves objective speech quality and intelligibility and outperforms the conventional STFT-NMF, DWPT-NMF, and DNN-IRM methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.