Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

Jagtap, Ameya D.; Shin, Yeonjong; Kawaguchi, Kenji; Karniadakis, George

doi:10.48550/arxiv.2105.09513

Cited by 1 publication

(2 citation statements)

References 36 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Kronecker product has already been incorporated in several areas within the deep learning framework: (i) In [14,19] the authors apply a Kronecker product decomposition (KPD) to decompose weight matrices of a trained network, although this typically requires a large number of terms for acceptable accuracy and is thus of limited applicability; (ii) a generalized KPD is extended to multi-dimensional tensors in [8] to reduce the number of weight parameters and computational complexity in convolutional neural networks; (iii) the Kronecker product has been shown as a viable method to reduce the computational time for back-propagation via an approximate inverse of the Fisher information matrix, [13], providing a means to increase decay rate in the loss; (iv) a "Kronecker neural network" in [10], has been established to implement adaptive activation functions in order to avoid local minima while training. We emphasize that our approach is distinct from these methods, as we fundamentally alter network architecture in an attempt to accelerate training.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Weight Matrix Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures

Hogue¹,

Kirby²,

Narayan³

2022

Preprint

View full text Add to dashboard Cite

Deep learning using neural networks is an effective technique for generating models of complex data. However, training such models can be expensive when networks have large model capacity resulting from a large number of layers and nodes. For training in such a computationally prohibitive regime, dimensionality reduction techniques ease the computational burden, and allow implementations of more robust networks. We propose a novel type of such dimensionality reduction via a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition; in particular our network construction can be viewed as a Kronecker product-induced sparsification of an "extended" fully connected network. Analysis and practical examples show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources, while achieving a similar error level compared to a traditional feedforward neural network.

show abstract

Section: Related Workmentioning

confidence: 99%

“…We now discuss the computational cost of a KDL-NN and give a broad technical explanation of why we expect the KDL-NN to be more efficient in practice. Given a KDL-NN defined by (10), gradient descent updates are performed on layer from L to 2 via the relations,…”

Section: Numerical Cost Of Forward Operations and Back-propagationmentioning

confidence: 99%

Weight Matrix Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures

Hogue¹,

Kirby²,

Narayan³

2022

Preprint

View full text Add to dashboard Cite

show abstract

Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

Cited by 1 publication

References 36 publications

Weight Matrix Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures

Weight Matrix Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures

Contact Info

Product

Resources

About