Abstract:State-of-the-art methods for solving smooth optimization problems are nonlinear conjugate gradient, low memory BFGS, and Majorize-Minimize (MM) subspace algorithms. The MM subspace algorithm which has been introduced more recently has shown good practical performance when compared with other methods on various optimization problems arising in signal and image processing. However, to the best of our knowledge, no general result exists concerning the theoretical convergence rate of the MM subspace algorithm. Thi… Show more
“…In addition, in a nonstationary context, a theoretical study of the tracking abilities of the algorithm should be conducted. Finally, let us emphasize that a detailed analysis of the convergence rate of the proposed method has been undertaken in our recent paper [72].…”
Section: Resultsmentioning
confidence: 99%
“…Based on our recent results in [72], we provide a convergence rate result for Algorithm (20) in the case when the functions (ψ s ) 1 s S are convex and twice differentiable. Then, there exists almost surely n ǫ ∈ N \ {0} such that, for every n n ǫ ,…”
Stochastic approximation techniques play an important role in solving many problems encountered in machine learning or adaptive signal processing. In these contexts, the statistics of the data are often unknown a priori or their direct computation is too intensive, and they have thus to be estimated online from the observed signals. For batch optimization of an objective function being the sum of a data fidelity term and a penalization (e.g. a sparsity promoting function), Majorize-Minimize (MM) methods have recently attracted much interest since they are fast, highly flexible, and effective in ensuring convergence. The goal of this paper is to show how these methods can be successfully extended to the case when the data fidelity term corresponds to a least squares criterion and the cost function is replaced by a sequence of stochastic approximations of it. In this context, we propose an online version of an MM subspace algorithm and we study its convergence by using suitable probabilistic tools. Simulation results illustrate the good practical performance of the proposed algorithm associated with a memory gradient subspace, when applied to both non-adaptive and adaptive filter identification problems.
“…In addition, in a nonstationary context, a theoretical study of the tracking abilities of the algorithm should be conducted. Finally, let us emphasize that a detailed analysis of the convergence rate of the proposed method has been undertaken in our recent paper [72].…”
Section: Resultsmentioning
confidence: 99%
“…Based on our recent results in [72], we provide a convergence rate result for Algorithm (20) in the case when the functions (ψ s ) 1 s S are convex and twice differentiable. Then, there exists almost surely n ǫ ∈ N \ {0} such that, for every n n ǫ ,…”
Stochastic approximation techniques play an important role in solving many problems encountered in machine learning or adaptive signal processing. In these contexts, the statistics of the data are often unknown a priori or their direct computation is too intensive, and they have thus to be estimated online from the observed signals. For batch optimization of an objective function being the sum of a data fidelity term and a penalization (e.g. a sparsity promoting function), Majorize-Minimize (MM) methods have recently attracted much interest since they are fast, highly flexible, and effective in ensuring convergence. The goal of this paper is to show how these methods can be successfully extended to the case when the data fidelity term corresponds to a least squares criterion and the cost function is replaced by a sequence of stochastic approximations of it. In this context, we propose an online version of an MM subspace algorithm and we study its convergence by using suitable probabilistic tools. Simulation results illustrate the good practical performance of the proposed algorithm associated with a memory gradient subspace, when applied to both non-adaptive and adaptive filter identification problems.
“…leads to the so-called MM Memory Gradient (3MG) algorithm [9], [10] whose great performances have been assessed in [9], [19]. It is worth noting that the quadratic structure of h makes a solution u k to (5) easy to be determined as:…”
In a learning context, data distribution are usually unknown. Observation models are also sometimes complex. In an inverse problem setup, these facts often lead to the minimization of a loss function with uncertain analytic expression. Consequently, its gradient cannot be evaluated in an exact manner. These issues have has promoted the development of so-called stochastic optimization methods, which are able to cope with stochastic errors in the gradient term. A natural strategy is to start from a deterministic optimization approach as a baseline, and to incorporate a stabilization procedure (e.g., decreasing stepsize, averaging) that yields improved robustness to stochastic errors. In the context of large-scale, differentiable optimization, an important class of methods relies on the principle of majorization-minimization (MM). MM algorithms are becoming increasingly popular in signal/image processing [18], [36] and machine learning [27], [34], [38]. MM approaches are fast, stable, require limited manual settings, and are often preferred by practitioners in application domains such as medical imaging [16] and telecommunications [29]. The present work introduces novel theoretical convergence guarantees for MM algorithms when approximate gradient terms are employed, generalizing some recent work [11], [27] to a wider class of functions and algorithms. We illustrate our theoretical results with a binary classification problem.
“…In order to find a minimizer of F, we propose a Majorize-Minimize (MM) approach, following the ideas in [67,64,68,69,70,71]. At each iteration of an MM algorithm, one constructs a tangent function that majorizes the given cost function and is equal to it at the current iterate.…”
In recent years, there has been a growing interest in mathematical models leading to the minimization, in a symmetric matrix space, of a Bregman divergence coupled with a regularization term. We address problems of this type within a general framework where the regularization term is split into two parts, one being a spectral function while the other is arbitrary. A Douglas-Rachford approach is proposed to address such problems, and a list of proximity operators is provided allowing us to consider various choices for the fit-to-data functional and for the regularization term. Based on our theoretical results, two novel approaches are proposed for the noisy graphical lasso problem, where a covariance or precision matrix has to be statistically estimated in the presence of noise. The Douglas-Rachford approach directly applies to the estimation of the covariance matrix. When the precision matrix is sought, we solve a nonconvex optimization problem. More precisely, we propose a majorization-minimization approach building a sequence of convex surrogates and solving the inner optimization subproblems via the aforementioned Douglas-Rachford procedure. We establish conditions for the convergence of this iterative scheme. We illustrate the good numerical performance of the proposed approaches with respect to state-of-the-art approaches on synthetic and real-world datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.