On regularization for a convolutional kernel in neural networks

Guo, Pei-Chang; Ye, Qiang

doi:10.48550/arxiv.1906.04866

Cited by 1 publication

(2 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Network weight regularizers dominate the deep learning regularizer literature because they support a large spectrum of tasks and architectures. Singular value decomposition (SVD) has been applied as a weight regularizer in several recent works (Zhang et al, 2018;Sedghi et al, 2018;Guo & Ye, 2019). Zhang et al (2018) employ SVD to avoid vanishing and exploding gradients in recurrent neural networks.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

SVMax: A Feature Embedding Regularizer

Taha,

Hanson,

Shrivastava

et al. 2021

Preprint

View full text Add to dashboard Cite

A neural network regularizer (e.g., weight decay) boosts performance by explicitly penalizing the complexity of a network. In this paper, we penalize inferior network activations -feature embeddings -which in turn regularize the network's weights implicitly. We propose singular value maximization (SVMax) to learn a more uniform feature embedding. The SVMax regularizer supports both supervised and unsupervised learning. Our formulation mitigates model collapse and enables larger learning rates. We evaluate the SV-Max regularizer using both retrieval and generative adversarial networks. We leverage a synthetic mixture of Gaussians dataset to evaluate SVMax in an unsupervised setting. For retrieval networks, SVMax achieves significant improvement margins across various ranking losses. Code available at https://bit.ly/3jNkgDt

show abstract

Section: Related Workmentioning

confidence: 99%

“…Zhang et al (2018) employ SVD to avoid vanishing and exploding gradients in recurrent neural networks. Similarly, Guo & Ye (2019) bound the singular values of the convolutional layer around 1 to preserve the layer's input and output norms. A bounded output norm mitigates the exploding/vanishing gradient problem.…”

Section: Related Workmentioning

confidence: 99%