Abstract-Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.
This paper presents a method for pedestrian activity classification and gait analysis based on the microelectromechanical-systems inertial measurement unit (IMU). The work targets two groups of applications, including the following: 1) human activity classification and 2) joint human activity and gait-phase classification. In the latter case, the gait phase is defined as a substate of a specific gait cycle, i.e., the states of the body between the stance and swing phases. We model the pedestrian motion with a continuous hidden Markov model (HMM) in which the output density functions are assumed to be Gaussian mixture models. For the joint activity and gait-phase classification, motivated by the cyclical nature of the IMU measurements, each individual activity is modeled by a "circular HMM." For both the proposed classification methods, proper feature vectors are extracted from the IMU measurements. In this paper, we report the results of conducted experiments where the IMU was mounted on the humans' chests. This permits the potential application of the current study in camera-aided inertial navigation for positioning and personal assistance for future research works. Five classes of activity, including walking, running, going upstairs, going downstairs, and standing, are considered in the experiments. The performance of the proposed methods is illustrated in various ways, and as an objective measure, the confusion matrix is computed and reported. The achieved relative figure of merits using the collected data validates the reliability of the proposed methods for the desired applications.
This paper presents two single-channel speech dereverberation methods to enhance the quality of speech signals that have been recorded in an enclosed space. For both methods, the room acoustics are modeled using a non-negative approximation of the convolutive transfer function (N-CTF), and to additionally exploit the spectral properties of the speech signal, such as the low-rank nature of the speech spectrogram, the speech spectrogram is modeled using non-negative matrix factorization (NMF). Two methods are described to combine the N-CTF and NMF models. In the first method, referred to as the integrated method, a cost function is constructed by directly integrating the speech NMF model into the N-CTF model, while in the second method, referred to as the weighted method, the N-CTF and NMF based cost functions are weighted and summed. Efficient update rules are derived to solve both optimization problems. In addition, an extension of the integrated method is presented, which exploits the temporal dependencies of the speech signal. Several experiments are performed on reverberant speech signals with and without background noise, where the integrated method yields a considerably higher speech quality than the baseline N-CTF method and a state-of-the-art spectral enhancement method. Moreover, the experimental results indicate that the weighted method can even lead to a better performance in terms of instrumental quality measures, but that the optimal weighting parameter depends on the room acoustics and the utilized NMF model. Modeling the temporal dependencies in the integrated method was found to be useful only for highly reverberant conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.