An adaptive noise sensing method is proposed to improve the speech sensing performance of speech-based applications operated over wireless sensor networks. The proposed method is based on nonnegative matrix factorization (NMF), which consists of adaptive noise sensing and noise reduction. In other words, adaptive noise sensing is performed by adapting a priori noise basis matrix of the NMF, which is estimated from the noise signal, resulting in an adapted noise basis matrix. Subsequently, the adapted noise basis matrix is used for the NMF decomposition of noisy speech into clean speech and background noise. The estimated clean speech signal is then applied to a front-end of the speech-based applications. The performance of the proposed NMF-based noise sensing and reduction method is first evaluated by measuring the source to distortion ratio (SDR), the source to interferences ratio (SIR), and the source to artifacts ratio (SAR). In addition, the proposed method is applied to an automatic speech recognition (ASR) system, which is a typical speech-based application, and then the average word error rate (WER) of the ASR is compared with that employing either a Wiener filter, or a conventional NMF-based noise reduction method using only a priori noise basis matrix.
This paper proposes a sound event detection (SED) method in tunnels to prevent further uncontrollable accidents. Tunnel accidents are accompanied by crashes and tire skids, which usually produce abnormal sounds. Since the tunnel environment always has a severe level of noise, the detection accuracy can be greatly reduced in the existing methods. To deal with the noise issue in the tunnel environment, the proposed method involves the preprocessing of tunnel acoustic signals and a classifier for detecting acoustic events in tunnels. For preprocessing, a non-negative tensor factorization (NTF) technique is used to separate the acoustic event signal from the noisy signal in the tunnel. In particular, the NTF technique developed in this paper consists of source separation and online noise learning. In other words, the noise basis is adapted by an online noise learning technique for enhancement in adverse noise conditions. Next, a convolutional recurrent neural network (CRNN) is extended to accommodate the contributions of the separated event signal and noise to the event detection; thus, the proposed CRNN is composed of event convolution layers and noise convolution layers in parallel followed by recurrent layers and the output layer. Here, a set of mel-filterbank feature parameters is used as the input features. Evaluations of the proposed method are conducted on two datasets: a publicly available road audio events dataset and a tunnel audio dataset recorded in a real traffic tunnel for six months. In the first evaluation where the background noise is low, the proposed CRNN-based SED method with online noise learning reduces the relative recognition error rate by 56.25% when compared to the conventional CRNN-based method with noise. In the second evaluation, where the tunnel background noise is more severe than in the first evaluation, the proposed CRNN-based SED method yields superior performance when compared to the conventional methods. In particular, it is shown that among all of the compared methods, the proposed method with the online noise learning provides the best recognition rate of 91.07% and reduces the recognition error rates by 47.40% and 28.56% when compared to the Gaussian mixture model (GMM)–hidden Markov model (HMM)-based and conventional CRNN-based SED methods, respectively. The computational complexity measurements also show that the proposed CRNN-based SED method requires a processing time of 599 ms for both the NTF-based source separation with online noise learning and CRNN classification when the tunnel noisy signal is one second long, which implies that the proposed method detects events in real-time.
A noncoherent low-frequency ultrasonic (LFU) communication system is proposed for near-field communication using commercial off-the-shelf (COTS) speakers and microphones. Since the LFU communication channel is known to be a frequencyselective characteristic, the proposed system is basically designed by differential phase-shift keying (DPSK) modulation with forward error correction. In addition, automatic gain control of the carrier frequency band over the LFU communication channel is proposed. Then, in order to optimize the symbol length of the proposed LFU communication system under a realistic aerial acoustic channel, a propagation model of the LFU communication channel is proposed by incorporating aerial acoustic attenuation. The performance of the proposed LFU communication system is demonstrated on two different tasks: bit error rate (BER) measurement and successful transmission rate (STR) comparison with Google Tone for various distances between the transmitter and the receiver. Consequently, the proposed method can operate without a bit error at a distance of 8 m under various noise conditions with sound pressure level of 80 dB. Moreover, the proposed method achieves higher STR than Google Tone on a task of URL transmission using two laptops.
In this paper, a nonnegative matrix factorization (NMF)-based speech enhancement method robust to real and diverse noise is proposed by online NMF dictionary learning without relying on prior knowledge of noise. Conventional NMF-based methods have used a fixed noise dictionary, which often results in performance degradation when the NMF noise dictionary cannot cover noise types that occur in real-life recording. Thus, the noise dictionary needs to be learned from noises according to the variation of recording environments. To this end, the proposed method first estimates noise spectra and then performs online noise dictionary learning by a discriminative NMF learning framework. In particular, the noise spectra are estimated from minimum mean squared error filtering, which is based on the local sparsity defined by a posteriori signal-to-noise ratio (SNR) estimated from the NMF separation of the previous analysis frame. The effectiveness of the proposed speech enhancement method is demonstrated by adding six different realistic noises to clean speech signals with various SNRs. Consequently, it is shown that the proposed method outperforms comparative methods in terms of signal-to-distortion ratio (SDR) and perceptual evaluation of speech quality (PESQ) for all kinds of simulated noise and SNR conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.