Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

Chen, Tianyi; Ding, Tianyu; Ji, Bo; Wang, Guanyi; Tian, Jing; Shi, Yixin; Sangbong, Yi; Tu, Xiaowei; Zhu, Zhihui

doi:10.48550/arxiv.2004.03639

Cited by 6 publications

(17 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To the best of our knowledge, we are the first to explore the over-parameterization issue appearing in the state-of-the-art DNN models for video interpolation. Concretely, we compress the recently proposed AdaCoF [32] via fine-grained pruning [67] based on sparsity-inducing optimization [7], and show that a 10× compressed AdaCoF is still able to maintain a similar benchmark performance as before, indicating a considerable amount of redundancy in the original model. The compression provides us two direct benefits: (i) it helps us understand the model architecture in depth, which in turn inspires an efficient design; (ii) the obtained compact model makes more room for further improvements that could potentially boost the performance to a new level.…”

Section: Introductionmentioning

confidence: 94%

“…It is known that with appropriately chosen λ the formulation (2) promotes a sparse solution, with which one can easily identify those important connections among neurons, namely the ones corresponding to non-zero weights. Towards solving (2), we utilize the newly proposed orthant-based stochastic method [7] for its efficient mechanism in promoting sparsity and less performance regression compared with other solvers. By solving the 1regularized problem (2), we indeed perform a fine-grained pruning since zeros are promoted in an unstructured manner.…”

Section: First Stage: Compression Of the Baselinementioning

confidence: 99%

“…We use the pretrained model provided by the authors. Starting with the 1regularized problem (2), where λ is set as 10 −4 , we run the orthant-based stochastic solver [7] for 20 epochs by feeding the model with only 1000 video triplets from Vimeo-90K [62]. For each epoch, we record the network density and the PSNR evaluated on the Middlebury dataset [1], as plotted in Figure 5.…”

Section: First Stage: Compression Of the Baselinementioning

confidence: 99%

See 2 more Smart Citations

CDFI: Compression-Driven Network Design for Frame Interpolation

Ding¹,

Liang²,

Zhu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

DNN-based frame interpolation-that generates the intermediate frames given two consecutive frames-typically relies on heavy model architectures with a huge number of features, preventing them from being deployed on systems with limited resources, e.g., mobile devices. We propose a compression-driven network design for frame interpolation (CDFI), that leverages model pruning through sparsityinducing optimization to significantly reduce the model size while achieving superior performance. Concretely, we first compress the recently proposed AdaCoF model and show that a 10× compressed AdaCoF performs similarly as its original counterpart; then we further improve this compressed model by introducing a multi-resolution warping module, which boosts visual consistencies with multi-level details. As a consequence, we achieve a significant performance gain with only a quarter in size compared with the original AdaCoF. Moreover, our model performs favorably against other state-of-the-arts in a broad range of datasets. Finally, the proposed compression-driven framework is generic and can be easily transferred to other DNNbased frame interpolation algorithm. Our source code is available at https://github.com/tding1/CDFI. * Equal contribution. This work was done when Tianyu Ding was an intern at Applied Sciences Group, Microsoft.† Corresponding author.Recently, a large number of researches have been conducted in this area, especially those based on deep neural networks (DNN) for their promising results in motion esti-

show abstract

Section: Introductionmentioning

confidence: 94%

Section: First Stage: Compression Of the Baselinementioning

confidence: 99%

Section: First Stage: Compression Of the Baselinementioning

confidence: 99%

See 1 more Smart Citation

CDFI: Compression-Driven Network Design for Frame Interpolation

Ding¹,

Liang²,

Zhu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Although Prox-SVRG/SAGA may have better theoretical convergence property than Prox-SG, they require higher time and space complexity to compute or estimate full gradient on a huge mini-batch or store previous gradient, which may be prohibitive for large-scale training especially when the memory is often limited. Besides, it is well noticed that SVRG does not work as desired on the popular non-convex deep learning applications (Defazio & Bottou, 2019;Chen et al, 2020). In contrast, Prox-SG is efficient and can also achieves the good initialization assumption in Theorem 1.…”

Section: The Initialization Stage Selectionmentioning

confidence: 99%

“…Unless Ω(x) is x 1 where each g ∈ G is singleton, then S k becomes an orthant face(Chen et al, 2020).…”

mentioning

confidence: 99%

Half-Space Proximal Stochastic Gradient Method for Group-Sparsity Regularized Problem

Chen,

Wang,

Ding

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Optimizing with group sparsity is significant in enhancing model interpretability in machining learning applications, e.g., feature selection, compressed sensing and model compression. However, for large-scale stochastic training problems, effective group sparsity exploration are typically hard to achieve. Particularly, the state-of-the-art stochastic optimization algorithms usually generate merely dense solutions. To overcome this shortage, we propose a stochastic method-Half-space Stochastic Projected Gradient (HSPG) method to search solutions of high group sparsity while maintain the convergence. Initialized by a simple Prox-SG Step, the HSPG method relies on a novel Half-Space Step to substantially boost the sparsity level. Numerically, HSPG demonstrates its superiority in deep neural networks, e.g., VGG16, ResNet18 and MobileNetV1, by computing solutions of higher group sparsity, competitive objective values and generalization accuracy.

show abstract

CDFI: Compression-Driven Network Design for Frame Interpolation

Ding

Liang

Zhu

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

View full text Add to dashboard Cite

DNN-based frame interpolation, which generates intermediate frames from two consecutive frames, is often dependent on model architectures with a large number of features, preventing their deployment on systems with limited resources, such as mobile devices. We present a compression-driven network design for frame interpolation that leverages model pruning through sparsity-inducing optimization to greatly reduce the model size while attaining higher performance. Concretely, we begin by compressing the recently proposed AdaCoF model and demonstrating that a 10× compressed AdaCoF performs similarly to its original counterpart, where different strategies for using layerwise sparsity information as a guide are comprehensively investigated under a variety of hyperparameter settings. We then enhance this compressed model by introducing a multi-resolution warping module, which improves visual consistency with multi-level details. As a result, we achieve a considerable performance gain with a quarter of the size of the original AdaCoF. In addition, our model performs favorably against other state-of-the-art approaches on a wide variety of datasets. We note that the suggested compression-driven framework is generic and can be easily transferred to other DNN-based frame interpolation algorithms. The source code is available at https://github.com/tding1/CDFI.

show abstract

Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

Cited by 6 publications

References 22 publications

CDFI: Compression-Driven Network Design for Frame Interpolation

CDFI: Compression-Driven Network Design for Frame Interpolation

Half-Space Proximal Stochastic Gradient Method for Group-Sparsity Regularized Problem

CDFI: Compression-Driven Network Design for Frame Interpolation

Contact Info

Product

Resources

About