Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. Abstract-Sparse data models, where data is assumed to be well represented as a linear combination of a few elements from a dictionary, have gained considerable attention in recent years, and their use has led to state-of-the-art results in many signal and image processing tasks. It is now well understood that the choice of the sparsity regularization term is critical in the success of such models. In this work, we use tools from information theory to propose a sparsity regularization term which has several theoretical and practical advantages over the more standard 0 or 1 ones, and which leads to improved coding performance and accuracy in reconstruction tasks. We also briefly report on further improvements obtained by imposing low mutual coherence and Gram matrix norm on the learned dictionaries.
I. INTRODUCTIONSparse modeling calls for constructing a succinct representation of some data as a combination of a few typical patterns (atoms) learned from the data itself. Significant contributions to the theory and practice of learning such collections of atoms (usually called dictionaries or codebooks), e.g., A critical component of sparse modeling is the actual sparsity of the representation, which is controlled by some model parameters. Choosing the optimal values of these parameters for the actual signals to model and the problem at hand is a challenging task. Several solutions to this problem have been proposed, ranging from the automatic tuning of the parameters [15] to Bayesian hierarchical models, where these parameters are themselves considered as random variables [14], [15], [24]. In this paper we address this challenge, and at the same time further generalize the standard sparsifying penalty functions (or priors for short), exploiting tools from information theory. The result is a prior that has several desirable theoretical and practical properties such as statistical consistency, improved robustness to outliers in the data, and leads to a better sparse reconstruction than 0 and 1 -based techniques in practice. This new model is complemented by imposing incoherence in the learned dictionary.