We propose the audio inpainting framework that recovers portions of audio data distorted due to impairments such as impulsive noise, clipping, and packet loss. In this framework, the distorted data are treated as missing and their location is assumed to be known. The signal is decomposed into overlapping time-domain frames and the restoration problem is then formulated as an inverse problem per audio frame. Sparse representation modeling is employed per frame, and each inverse problem is solved using the Orthogonal Matching Pursuit algorithm together with a discrete cosine or a Gabor dictionary. The Signal-to-Noise Ratio performance of this algorithm is shown to be comparable or better than state-of-the-art methods when blocks of samples of variable durations are missing. We also demonstrate that the size of the block of missing samples, rather than the overall number of missing samples, is a crucial parameter for high quality signal restoration. We further introduce a constrained Matching Pursuit approach for the special case of audio declipping that exploits the sign pattern of clipped audio samples and their maximal absolute value, as well as allowing the user to specify the maximum amplitude of the signal. This approach is shown to outperform state-of-the-art and commercially available methods for audio declipping in terms of Signal-to-Noise Rati
Abstract-A new method for the estimation of multiple concurrent pitches in piano recordings is presented. It addresses the issue of overlapping overtones by modeling the spectral envelope of the overtones of each note with a smooth autoregressive model. For the background noise, a moving-average model is used and the combination of both tends to eliminate harmonic and sub-harmonic erroneous pitch estimations. This leads to a complete generative spectral model for simultaneous piano notes, which also explicitly includes the typical deviation from exact harmonicity in a piano overtone series. The pitch set which maximizes an approximate likelihood is selected from among a restricted number of possible pitch combinations as the one. Tests have been conducted on a large homemade database called MAPS, composed of piano recordings from a real upright piano and from high-quality samples.
We aim to assess the perceived quality of estimated source signals in the context of audio source separation. These signals may involve one or more kinds of distortions, including distortion of the target source, interference from the other sources or musical noise artifacts. We propose a subjective test protocol to assess the perceived quality with respect to each kind of distortion and collect the scores of 20 subjects over 80 sounds. We then propose a family of objective measures aiming to predict these subjective scores based on the decomposition of the estimation error into several distortion components and on the use of the PEMO-Q perceptual salience measure to provide multiple features that are then combined. These measures increase correlation with subjective scores up to 0.5 compared to nonlinear mapping of individual state-of-the-art source separation measures. Finally, we released the data and code presented in this paper in a freely-available toolkit called PEASS.
We present a novel sparse representation based approach for the restoration of clipped audio signals. In the proposed approach, the clipped signal is decomposed into overlapping frames and the declipping problem is formulated as an inverse problem, per audio frame. This problem is further solved by a constrained matching pursuit algorithm, that exploits the sign pattern of the clipped samples and their maximal absolute value. Performance evaluation with a collection of music and speech signals demonstrate superior results compared to existing algorithms, over a wide range of clipping levels.
International audienceRecent computational strategies based on screening tests have been proposed to accelerate algorithms addressing penalized sparse regression problems such as the Lasso. Such approaches build upon the idea that it is worth dedicating some small computational effort to locate inactive atoms and remove them from the dictionary in a preprocessing stage so that the regression algorithm working with a smaller dictionary will then converge faster to the solution of the initial problem. We believe that there is an even more efficient way to screen the dictionary and obtain a greater acceleration: inside each iteration of the regression algorithm, one may take advantage of the algorithm computations to obtain a new screening test for free with increasing screening effects along the iterations. The dictionary is henceforth dynamically screened instead of being screened statically, once and for all, before the first iteration. We formalize this dynamic screening principle in a general algorithmic scheme and apply it by embedding inside a number of first-order algorithms adapted existing screening tests to solve the Lasso or new screening tests to solve the Group-Lasso. Computational gains are assessed in a large set of experiments on synthetic data as well as real-world sounds and images. They show both the screening efficiency and the gain in terms running times
International audienceCompressed sensing is the ability to retrieve a sparse vector from a set of linear measurements. The task gets more difficult when the sensing process is not perfectly known. We address such a problem in the case where the sensors have been permuted, i.e., the order of the measurements is unknown. We propose a branch-and-bound algorithm that converges to the solution. The experimental study shows that our approach always retrieves the unknown permutation, while a simple convex relaxation strategy almost always fails. In terms of its time complexity, we show that the proposed algorithm converges quickly with respect to the combinatorial nature of the problem
The efficiency of most pitch estimation methods declines when the analyzed frame is shortened and/or when a wide fundamental frequency (F0) range is targeted. The technique proposed herein jointly uses a periodicity analysis and a spectral matching process to improve the F0 estimation performance in such an adverse context: a 60ms-long data frame together with the whole, 71 /4-octaves, piano tessitura. The enhancements are obtained thanks to a parametric approach which, among other things, models the inharmonicity of piano tones. The performance of the algorithm is assessed, is compared to the results obtained from other estimators and is discussed in order to characterize their behavior and typical misestimations.
When solving inverse problems and using optimization methods with matrix variables in signal processing and machine learning, it is customary to assume some low-rank prior on the targeted solution. Nonnegative matrix factorization of spectrograms is a case in point in audio signal processing. However, this low-rank prior is not straightforwardly related to complex matrices obtained from a short-time Fourier -or discrete Gabor -transform (STFT), which is generally defined from and studied based on a modulation operator and a translation operator applied to a so-called window. This paper is a first study of the low-rankness property of time-frequency matrices. We characterize the set of signals with a rank-r (complex) STFT matrix in the case of a unit hop size and frequency step with few assumptions on the transform parameters. We discuss the scope of this result and its implications on low-rank approximations of STFT matrices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.