2021
DOI: 10.48550/arxiv.2105.09513
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

Abstract: We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of constructing a very wide network while keeping the number of parameters low. Our theoretical analysis reveals that under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks. This is also empirically verified through a set of computation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 36 publications
(52 reference statements)
0
2
0
Order By: Relevance
“…The Kronecker product has already been incorporated in several areas within the deep learning framework: (i) In [14,19] the authors apply a Kronecker product decomposition (KPD) to decompose weight matrices of a trained network, although this typically requires a large number of terms for acceptable accuracy and is thus of limited applicability; (ii) a generalized KPD is extended to multi-dimensional tensors in [8] to reduce the number of weight parameters and computational complexity in convolutional neural networks; (iii) the Kronecker product has been shown as a viable method to reduce the computational time for back-propagation via an approximate inverse of the Fisher information matrix, [13], providing a means to increase decay rate in the loss; (iv) a "Kronecker neural network" in [10], has been established to implement adaptive activation functions in order to avoid local minima while training. We emphasize that our approach is distinct from these methods, as we fundamentally alter network architecture in an attempt to accelerate training.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The Kronecker product has already been incorporated in several areas within the deep learning framework: (i) In [14,19] the authors apply a Kronecker product decomposition (KPD) to decompose weight matrices of a trained network, although this typically requires a large number of terms for acceptable accuracy and is thus of limited applicability; (ii) a generalized KPD is extended to multi-dimensional tensors in [8] to reduce the number of weight parameters and computational complexity in convolutional neural networks; (iii) the Kronecker product has been shown as a viable method to reduce the computational time for back-propagation via an approximate inverse of the Fisher information matrix, [13], providing a means to increase decay rate in the loss; (iv) a "Kronecker neural network" in [10], has been established to implement adaptive activation functions in order to avoid local minima while training. We emphasize that our approach is distinct from these methods, as we fundamentally alter network architecture in an attempt to accelerate training.…”
Section: Related Workmentioning
confidence: 99%
“…We now discuss the computational cost of a KDL-NN and give a broad technical explanation of why we expect the KDL-NN to be more efficient in practice. Given a KDL-NN defined by (10), gradient descent updates are performed on layer from L to 2 via the relations,…”
Section: Numerical Cost Of Forward Operations and Back-propagationmentioning
confidence: 99%