2020
DOI: 10.48550/arxiv.2004.03639
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

Abstract: Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -to solve perhaps the most popular instance, i.e., the 1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
17
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5

Relationship

4
1

Authors

Journals

citations
Cited by 6 publications
(17 citation statements)
references
References 22 publications
0
17
0
Order By: Relevance
“…To the best of our knowledge, we are the first to explore the over-parameterization issue appearing in the state-of-the-art DNN models for video interpolation. Concretely, we compress the recently proposed AdaCoF [32] via fine-grained pruning [67] based on sparsity-inducing optimization [7], and show that a 10× compressed AdaCoF is still able to maintain a similar benchmark performance as before, indicating a considerable amount of redundancy in the original model. The compression provides us two direct benefits: (i) it helps us understand the model architecture in depth, which in turn inspires an efficient design; (ii) the obtained compact model makes more room for further improvements that could potentially boost the performance to a new level.…”
Section: Introductionmentioning
confidence: 94%
See 2 more Smart Citations
“…To the best of our knowledge, we are the first to explore the over-parameterization issue appearing in the state-of-the-art DNN models for video interpolation. Concretely, we compress the recently proposed AdaCoF [32] via fine-grained pruning [67] based on sparsity-inducing optimization [7], and show that a 10× compressed AdaCoF is still able to maintain a similar benchmark performance as before, indicating a considerable amount of redundancy in the original model. The compression provides us two direct benefits: (i) it helps us understand the model architecture in depth, which in turn inspires an efficient design; (ii) the obtained compact model makes more room for further improvements that could potentially boost the performance to a new level.…”
Section: Introductionmentioning
confidence: 94%
“…It is known that with appropriately chosen λ the formulation (2) promotes a sparse solution, with which one can easily identify those important connections among neurons, namely the ones corresponding to non-zero weights. Towards solving (2), we utilize the newly proposed orthant-based stochastic method [7] for its efficient mechanism in promoting sparsity and less performance regression compared with other solvers. By solving the 1regularized problem (2), we indeed perform a fine-grained pruning since zeros are promoted in an unstructured manner.…”
Section: First Stage: Compression Of the Baselinementioning
confidence: 99%
See 1 more Smart Citation
“…Although Prox-SVRG/SAGA may have better theoretical convergence property than Prox-SG, they require higher time and space complexity to compute or estimate full gradient on a huge mini-batch or store previous gradient, which may be prohibitive for large-scale training especially when the memory is often limited. Besides, it is well noticed that SVRG does not work as desired on the popular non-convex deep learning applications (Defazio & Bottou, 2019;Chen et al, 2020). In contrast, Prox-SG is efficient and can also achieves the good initialization assumption in Theorem 1.…”
Section: The Initialization Stage Selectionmentioning
confidence: 99%
“…Unless Ω(x) is x 1 where each g ∈ G is singleton, then S k becomes an orthant face(Chen et al, 2020).…”
mentioning
confidence: 99%