Abstract-Multiple pitch estimation consists of estimating the fundamental frequencies and saliences of pitched sounds over short time frames of an audio signal. This task forms the basis of several applications in the particular context of musical audio. One approach is to decompose the short-term magnitude spectrum of the signal into a sum of basis spectra representing individual pitches scaled by time-varying amplitudes, using algorithms such as nonnegative matrix factorization (NMF). Prior training of the basis spectra is often infeasible due to the wide range of possible musical instruments. Appropriate spectra must then be adaptively estimated from the data, which may result in limited performance due to overfitting issues. In this article, we model each basis spectrum as a weighted sum of narrowband spectra representing a few adjacent harmonic partials, thus enforcing harmonicity and spectral smoothness while adapting the spectral envelope to each instrument. We derive a NMFlike algorithm to estimate the model parameters and evaluate it on a database of piano recordings, considering several choices for the narrowband spectra. The proposed algorithm performs similarly to supervised NMF using pre-trained piano spectra but improves pitch estimation performance by 6% to 10% compared to alternative unsupervised NMF algorithms.
Abstract-A new method for the estimation of multiple concurrent pitches in piano recordings is presented. It addresses the issue of overlapping overtones by modeling the spectral envelope of the overtones of each note with a smooth autoregressive model. For the background noise, a moving-average model is used and the combination of both tends to eliminate harmonic and sub-harmonic erroneous pitch estimations. This leads to a complete generative spectral model for simultaneous piano notes, which also explicitly includes the typical deviation from exact harmonicity in a piano overtone series. The pitch set which maximizes an approximate likelihood is selected from among a restricted number of possible pitch combinations as the one. Tests have been conducted on a large homemade database called MAPS, composed of piano recordings from a real upright piano and from high-quality samples.
Abstract-Gaussian process (GP) models are very popular for machine learning and regression and they are widely used to account for spatial or temporal relationships between multivariate random variables. In this paper, we propose a general formulation of underdetermined source separation as a problem involving GP regression. The advantage of the proposed unified view is firstly to describe the different underdetermined source separation problems as particular cases of a more general framework. Secondly, it provides a flexible means to include a variety of prior information concerning the sources such as smoothness, local stationarity or periodicity through the use of adequate covariance functions. Thirdly, given the model, it provides an optimal solution in the minimum mean squared error (MMSE) sense to the source separation problem. In order to make the GP models tractable for very large signals, we introduce framing as a GP approximation and we show that computations for regularly sampled and locally stationary GPs can be done very efficiently in the frequency domain. These findings establish a deep connection between GP and Nonnegative Tensor Factorizations with the Itakura-Saito distance and lead to effective methods to learn GP hyperparameters for very large and regularly sampled signals.
Abstract-This paper introduces a fast implementation of the power iteration method for subspace tracking, based on an approximation less restrictive than the well known projection approximation. This algorithm, referred to as the fast API method, guarantees the orthonormality of the subspace weighting matrix at each iteration. Moreover, it outperforms many subspace trackers related to the power iteration method, such as PAST, NIC, NP3 and OPAST, while having the same computational complexity. The API method is designed for both exponential windows and sliding windows. Our numerical simulations show that sliding windows offer a faster tracking response to abrupt signal variations.
International audienceWe address the issue of underdetermined source separation in a particular informed configuration where both the sources and the mixtures are known during a so-called encoding stage. This knowledge enables the computation of a side-information which is small enough to be inaudibly embedded into the mixtures. At the decoding stage, the sources are no longer assumed to be known, only the mixtures and the extracted side-information are processed for source separation. The proposed system models the sources as independent and locally stationary Gaussian processes (GP) and the mixing process as a linear filtering. This model allows reliable estimation of the sources through generalized Wiener filtering, provided their spectrograms are known. As these spectrograms are too large to be embedded in the mixtures, we show how they can be efficiently approximated using either Nonnegative Tensor Factorization (NTF) or image compression. A high-capacity embedding method is used by the system to inaudibly embed the separation side-information into the mixtures. This method is an application of the Quantization Index Modulation technique applied to the time-frequency coefficients of the mixtures and permits to reach embedding rates of about 250 kbps. Finally, a study of the performance of the full system is presented
High resolution methods such as the ESPRIT algorithm are of major interest for estimating discrete spectra, since they overcome the resolution limit of the Fourier transform and provide very accurate estimates of the signal parameters. In signal processing literature, most contributions focus on the estimation of exponentially modulated sinusoids in a noisy signal. In this paper, we introduce a more general class of signals, involving both amplitude and frequency modulations. We show that this Polynomial Amplitude Complex Exponentials (PACE) model is the most general model tractable by high resolution methods. We develop a generalized ESPRIT algorithm for estimating the signal parameters, and we show that this model can be characterized by means of a geometrical criterion.
In the recent years, many studies have focused on the singlesensor separation of independent waveforms using so-called softmasking strategies, where the short term Fourier transform of the mixture is multiplied element-wise by a ratio of spectrogram models. When the signals are wide-sense stationary, this strategy is theoretically justified as an optimal Wiener filtering: the power spectrograms of the sources are supposed to add up to yield the power spectrogram of the mixture. However, experience shows that using fractional spectrograms instead, such as the amplitude, yields good performance in practice, because they experimentally better fit the additivity assumption. To the best of our knowledge, no probabilistic interpretation of this filtering procedure was available to date. In this paper, we show that assuming the additivity of fractional spectrograms for the purpose of building soft-masks can be understood as separating locally stationary α-stable harmonizable processes, α-harmonizable in short, thus justifying the procedure theoretically.
This paper presents theoretical and experimental results about constrained non-negative matrix factorization (NMF) in a Bayesian framework. A model of superimposed Gaussian components including harmonicity is proposed, while temporal continuity is enforced through an inverse-Gamma Markov chain prior. We then exhibit a space-alternating generalized expectation-maximization (SAGE) algorithm to estimate the parameters. Computational time is reduced by initializing the system with an original variant of multiplicative harmonic NMF, which is described as well. The algorithm is then applied to perform polyphonic piano music transcription. It is compared to other state-of-the-art algorithms, especially NMF-based. Convergence issues are also discussed on a theoretical and experimental point of view. Bayesian NMF with harmonicity and temporal continuity constraints is shown to outperform other standard NMF-based transcription systems, providing a meaningful mid-level representation of the data. However, temporal smoothness has its drawbacks, as far as transients are concerned in particular, and can be detrimental to transcription performance when it is the only constraint used. Possible improvements of the temporal prior are discussed.Index Terms-Audio source separation, Bayesian regression, music transcription, non-negative matrix factorization (NMF), unsupervised machine learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.