Given a dictionary D ؍ {d ᠪ k} of vectors d ᠪ k, we seek to represent a signal S ᠪ as a linear combination S ᠪ ؍ ͚ k ␥(k)d ᠪ k, with scalar coefficients ␥(k). In particular, we aim for the sparsest representation possible. In general, this requires a combinatorial optimization process. Previous work considered the special case where D is an overcomplete system consisting of exactly two orthobases and has shown that, under a condition of mutual incoherence of the two bases, and assuming that S ᠪ has a sufficiently sparse representation, this representation is unique and can be found by solving a convex optimization problem: specifically, minimizing the ഞ 1 norm of the coefficients ᠪ ␥. In this article, we obtain parallel results in a more general setting, where the dictionary D can arise from two or several bases, frames, or even less structured systems. We sketch three applications: separating linear features from planar ones in 3D data, noncooperative multiuser encoding, and identification of over-complete independent component models.W orkers throughout engineering and the applied sciences frequently want to represent data (signals, images) in the most parsimonious terms. In signal analysis specifically, they often consider models proposing that the signal of interest is sparse in some transform domain, such as the wavelet or Fourier domain (1). However, there is a growing realization that many signals are mixtures of diverse phenomena, and no single transform can be expected to describe them well; instead, we should consider models making sparse combinations of generating elements from several different transforms (1-4). Unfortunately, as soon as we start considering general collections of generating elements, the attempt to find sparse solutions enters mostly uncharted territory, and one expects at best to use plausible heuristic methods (5-8) and certainly to give up hope of rigorous optimality. In this article, we will develop some rigorous results showing that it can be possible to find optimally sparse representations by efficient techniques in certain cases.Suppose we are given a dictionary D of generating ele-L , each one a vector in C N , which we assume normal-The dictionary D can be viewed a matrix of size N ϫ L, with generating elements for columns. We do not suppose any fixed relationship between N and L. In particular, the dictionary can be overcomplete and contain linearly dependent subsets, and in particular need not be a basis. As examples of such dictionaries, we can mention: wavelet packets and cosine packets dictionaries of Coifman et al. (3), which contain L ϭ N log(N) elements, representing transient harmonic phenomena with a variety of durations and locations; wavelet frames, such as the directional wavelet frames of Ron and Shen (9), which contain L ϭ CN elements for various constants C Ͼ 1; and the combined ridgelet͞wavelet systems of Starck, Candès, and Donoho (8,10). Faced with such variety, we cannot call individual elements in the dictionary basis elements; we will use the term...