We consider the audio declipping problem by using iterative thresholding algorithms and the principle of social sparsity. This recently introduced approach features thresholding/shrinkage operators which allow to model dependencies between neighboring coefficients in expansions with time-frequency dictionaries. A new unconstrained convex formulation of the audio declipping problem is introduced. The chosen structured thresholding operators are the so called windowed group-Lasso and the persistent empirical Wiener. The usage of these operators significantly improves the quality of the reconstruction, compared to simple soft-thresholding. The resulting algorithm is fast, simple to implement, and it outperforms the state of the art in terms of signal to noise ratio.
Abstract-Sparse and structured signal expansions on dictionaries can be obtained through explicit modeling in the coefficient domain. The originality of the present article lies in the construction and the study of generalized shrinkage operators, whose goal is to identify structured significance maps and give rise to structured thresholding. These generalize Group Lasso and the previously introduced Elitist Lasso by introducing more flexibility in the coefficient domain modeling, and lead to the notion of social sparsity. The proposed operators are studied theoretically and embedded in iterative thresholding algorithms. Moreover, a link between these operators and a convex functional is established. Numerical studies on both simulated and real signals confirm the benefits of such an approach.Index Terms-Structured Sparsity, Iterative Thresholding, Convex Optimization I. INTRODUCTIONA wide range of inverse problems arising in signal processing have benefited from sparsity. Introduced in the mid 90's by Chen, Donoho and Saunders [1], the idea is that a signal can be efficiently represented as a linear combination of elementary atoms chosen from an appropriate dictionary. Here, efficiently may be understood in the sense that only few atoms are needed to reconstruct the signal. The same idea appeared in the machine learning community [2], where often only few variables are relevant in inference tasks based on observations living in very high dimensional spaces.The natural measure of the cardinality of a support set, and hence its sparsity, is the 0 "norm" which counts the number of non-zero coefficients. Minimizing such a penalty leads to a combinatorial problem which is usually relaxed into a 1 norm which is convex.Solving an inverse problem by using the sparse principle can be done by the following steps:• Choose a dictionary where the signal of interest is supposed to be sparse. Such a choice is driven by the nature
A curious divide characterizes the usage of audio descriptors for timbre research in music information research (MIR) and music psychology. While MIR uses a multitude of audio descriptors for tasks such as automatic instrument classification, only a highly constrained set is used to describe the physical correlates of timbre perception in parts of music psychology. We argue that this gap is not coincidental and results from the differences in the two fields’ methodologies, their epistemic groundwork, and research goals. This paper lays out perspectives on the emergence of the divide and reviews studies in both fields with regards to divergences in research methods and goals. [...
This paper investigates the role of acoustic and categorical information in timbre dissimilarity ratings. Using a Gammatone-filterbank-based sound transformation, we created tones that were rated as less familiar than recorded tones from orchestral instruments and that were harder to associate with an unambiguous sound source (Experiment 1). A subset of transformed tones, a set of orchestral recordings, and a mixed set were then rated on pairwise dissimilarity (Experiment 2A). We observed that recorded instrument timbres clustered into subsets that distinguished timbres according to acoustic and categorical properties. For the subset of cross-category comparisons in the mixed set, we observed asymmetries in the distribution of ratings, as well as a stark decay of inter-rater agreement. These effects were replicated in a more robust within-subjects design (Experiment 2B) and cannot be explained by acoustic factors alone. We finally introduced a novel model of timbre dissimilarity based on partial least-squares regression that compared the contributions of both acoustic and categorical timbre descriptors. The best model fit (R2 = 0.88) was achieved when both types of descriptors were taken into account. These findings are interpreted as evidence for an interplay of acoustic and categorical information in timbre dissimilarity perception.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.