Group sparse regularization for deep neural networks

Scardapane, Simone; Comminiello, Danilo; Hussain, Amir; Uncini, Aurelio

doi:10.1016/j.neucom.2017.02.029

Cited by 385 publications

(268 citation statements)

References 38 publications

Supporting

Mentioning

250

Contrasting

Unclassified

Order By: Relevance

“…The most important change of our training method is the application of L1 regularization to the weights [23]. While L1 regularization is primarily a way to avoid overfitting, it has a desirable side effect: sparsifying the network's weight matrices.…”

Section: The Training Methodsmentioning

confidence: 99%

“…While L1 regularization is primarily a way to avoid overfitting, it has a desirable side effect: sparsifying the network's weight matrices. While most state-of-the-art methods of pruning devote considerable computational expense to find the least influential weights, using L1 regularization ensures that the majority of the weights are already effectively zero and can be pruned without affecting the network at all [23].…”

Section: The Training Methodsmentioning

confidence: 99%

See 1 more Smart Citation

ROBO: Robust, Fully Neural Object Detection for Robot Soccer

Szemenyei

Estivill‐Castro

2019

RoboCup 2019: Robot World Cup XXIII

View full text Add to dashboard Cite

Deep Learning has become exceptionally popular in the last few years due to its success in computer vision [11,21,18] and other fields of AI [25,28,5]. However, deep neural networks are computationally expensive, which limits their application in low power embedded systems, such as mobile robots. In this paper, an efficient neural network architecture is proposed for the problem of detecting relevant objects in robot soccer environments. The ROBO model's increase in efficiency is achieved by exploiting the peculiarities of the environment. Compared to the state-of-the-art Tiny YOLO model, the proposed network provides approximately 35 times decrease in run time, while achieving superior average precision, although at the cost of slightly worse localization accuracy.

show abstract

Section: The Training Methodsmentioning

confidence: 99%

Section: The Training Methodsmentioning

confidence: 99%

ROBO: Robust, Fully Neural Object Detection for Robot Soccer

Szemenyei

Estivill‐Castro

2019

RoboCup 2019: Robot World Cup XXIII

View full text Add to dashboard Cite

show abstract

“…Other work focusing on the sparsity of parameters in DNN is also related to this article. In [22], group Lasso penalty is adopted to impose group-level sparsity on networks connections. But this work can hardly satisfy our standardization constraints, i.e.…”

Section: Relation With Previous Workmentioning

confidence: 99%

SSN: Learning Sparse Switchable Normalization via SparsestMax

Shao

Li²,

Ren³

et al. 2019

Int J Comput Vis

View full text Add to dashboard Cite

Normalization methods improve both optimization and generalization of ConvNets. To further boost performance, the recently-proposed switchable normalization (SN) provides a new perspective for deep learning: it learns to select different normalizers for different convolution layers of a ConvNet. However, SN uses softmax function to learn importance ratios to combine normalizers, leading to redundant computations compared to a single normalizer.This work addresses this issue by presenting Sparse Switchable Normalization (SSN) where the importance ratios are constrained to be sparse. Unlike 1 and 0 constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax. SSN has several appealing properties. (1) It inherits all benefits from SN such as applicability in various tasks and robustness to a wide range of batch sizes.(2) It is guaranteed to select only one normalizer for each normalization layer, avoiding redundant computations. (3) SSN can be transferred to various tasks in an end-to-end manner. Extensive experiments show that SSN outperforms its counterparts on various challenging benchmarks such as ImageNet, Cityscapes, ADE20K, and Kinetics.

show abstract

“…In this case, a pruned model can take advantage of dense matrix computation just as the original model does. L 1 norm regularization is commonly used to enforce individual parameter sparsity, and Group Lasso [39] is normally used to enforce structural parameter sparsity [32].…”

Section: Related Workmentioning

confidence: 99%

VACL: Variance-Aware Cross-Layer Regularization for Pruning Deep Residual Networks

Gao

Liu

Chien

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

Improving weight sparsity is a common strategy for producing light-weight deep neural networks. However, pruning models with residual learning is more challenging. In this paper, we introduce Variance-Aware Cross-Layer (VACL), a novel approach to address this problem. VACL consists of two parts, a Cross-Layer grouping and a Variance Aware regularization. In Cross-Layer grouping the i th filters of layers connected by skip-connections are grouped into one regularization group. Then, the Variance-Aware regularization term takes into account both the first and second-order statistics of the connected layers to constrain the variance within a group. Our approach can effectively improve the structural sparsity of residual models. For CIFAR10, the proposed method reduces a ResNet model by up to 79.5% with no accuracy drop, and reduces a ResNeXt model by up to 82% with < 1% accuracy drop. For ImageNet, it yields a pruned ratio of up to 63.3% with < 1% top-5 accuracy drop. Our experimental results show that the proposed approach significantly outperforms other state-of-the-art methods in terms of overall model size and accuracy.

show abstract

Group sparse regularization for deep neural networks

Cited by 385 publications

References 38 publications

ROBO: Robust, Fully Neural Object Detection for Robot Soccer

ROBO: Robust, Fully Neural Object Detection for Robot Soccer

SSN: Learning Sparse Switchable Normalization via SparsestMax

VACL: Variance-Aware Cross-Layer Regularization for Pruning Deep Residual Networks

Contact Info

Product

Resources

About