In this paper, unsupervised learning is used to separate percussive and harmonic sounds from monaural non-vocal polyphonic signals. Our algorithm is based on a modified non-negative matrix factorization (NMF) procedure that no labeled data is required to distinguish between percussive and harmonic bases because information from percussive and harmonic sounds is integrated into the decomposition process. NMF is performed in this process by assuming that harmonic sounds exhibit spectral sparseness (narrowband sounds) and temporal smoothness (steady sounds), whereas percussive sounds exhibit spectral smoothness (broadband sounds) and temporal sparseness (transient sounds). The evaluation is performed using several real-world excerpts from different musical genres. Comparing the developed approach to three current state-of-the art separation systems produces promising results.
This paper presents a real-time audio-to-score alignment system for musical applications. The aim of these systems is to synchronize a live musical performance with its symbolic representation in a music sheet. We have used as a base our previous real-time alignment system by enhancing it with a traceback stage, a stage used in offline alignment to improve the accuracy of the aligned note. This stage introduces some delay, what forces to assume a trade-off between output delay and alignment accuracy that must be considered in the design of this type of hybrid techniques. We have also improved our former system to execute faster in order to minimize this delay. Other interesting improvements, like identification of silence frames, have also been incorporated to our proposed system.
In this paper we present a harmonic constrained Multichannel Non-Negative Matrix Factorization (MNMF) method for the task of blind music source separation. In this model, the mixing filter encodes the spatial information in terms of magnitude and phase differences between channels whereas the source variances are modeled using a harmonic constrained NMF structure. In this work, the spatial covariance matrix is obtained from the constant-Q transform to account to the frequency logarithmic scale inherent in music signals and reduce the dimensionality of the parameters. Moreover, to mitigate the strong sensitivity to parameter initialization, we propose to initialize the spatial weights with the output of the steered response power (SRP) with phase transform (PHAT) algorithm. The proposed method has been evaluated for the task of music source separation using a multichannel classical chamber music dataset with several polyphony and reverberation setups. Furthermore, comparison with other state-of-the-art signal decomposition methods have been accomplished showing reliable results in terms of BSS_EVAL metrics.
In this study, the authors present a novel voicing detection algorithm which employs the well-known aperiodicity measure to detect voiced speech in signals contaminated with non-stationary noise. The method computes a signal-adaptive decision threshold which takes into account the current noise level, enabling voicing detection by direct comparison with the extracted aperiodicity. This adaptive threshold is updated at each frame by making a simple estimate of the current noise power, and thus is adapted to fluctuating noise conditions. Once the aperiodicity is computed, the method only requires a small number of operations, and enables its implementation in challenging devices (such as hearing aids) if an efficient approximation of the difference function is employed to extract the aperiodicity. Evaluation over a database of speech sentences degraded by several types of noise reveals that the proposed voicing classifier is robust against different noises and signal-to-noise ratios. In addition, to evaluate the applicability of the method for speech enhancement, a simple F 0-based speech enhancement algorithm integrating the proposed classifier is implemented. The system is shown to achieve competitive results, in terms of objective measures, when compared with other well-known speech enhancement approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.