Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs

Li, Jing; Shen, Yichen; Dubček, Tena; Peurifoy, John; Skirlo, Scott A.; LeCun, Yann; Tegmark, Max; Soljačić, Marin

doi:10.48550/arxiv.1612.05231

Cited by 36 publications

(30 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A unitary matrix features uniform eigenvalues and reversibility. It is widely used as an approach to ease the gradient exploding and vanishing problem Arjovsky et al (2015); Wisdom et al (2016); Jing et al (2016) and the memory wall problem Luo et al (2019). One of the simplest ways to parametrize a unitary matrix is representing a unitary matrix as a product of two-level unitary operations Jing et al (2016).…”

Section: Unitary Matricesmentioning

confidence: 99%

“…It is widely used as an approach to ease the gradient exploding and vanishing problem Arjovsky et al (2015); Wisdom et al (2016); Jing et al (2016) and the memory wall problem Luo et al (2019). One of the simplest ways to parametrize a unitary matrix is representing a unitary matrix as a product of two-level unitary operations Jing et al (2016). A real unitary matrix of size N can be parametrized compactly by N(N − 1)/2 rotation operations LI et al ( 2013)…”

Section: Unitary Matricesmentioning

confidence: 99%

See 1 more Smart Citation

Differentiate Everything with a Reversible Embeded Domain-Specific Language

Liu¹,

Zhao²

2020

Preprint

View full text Add to dashboard Cite

This paper considers the source-to-source automatic differentiation (AD) in a reversible language. We start by reviewing the limitations of traditional AD frameworks. To solve the issues in these frameworks, we developed a reversible eDSL NiLang in Julia that can differentiate a general program while being compatible with Julia's ecosystem. It empowers users the flexibility to tradeoff time, space, and energy so that one can use it to obtain gradients and Hessians ranging for elementary mathematical functions, sparse matrix operations, and linear algebra that are widely used in scientific programming. We demonstrate that a source-to-source AD framework can achieve the state-of-the-art performance by showing and benchmarking several examples. Finally, we will discuss the challenges that we face towards rigorous reversible programming, mainly from the instruction and hardware perspective.Preprint. Under review.

show abstract

Section: Unitary Matricesmentioning

confidence: 99%

Section: Unitary Matricesmentioning

confidence: 99%

Differentiate Everything with a Reversible Embeded Domain-Specific Language

Liu¹,

Zhao²

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…IGLOO may be less impacted by permutation than RNN style structures because it is finding a representation for a sequence not by looking at each element sequentially but as a whole, taking patches from the whole input space. (Le et al, 2015) 97.0 82.0 uRNN (Arjovsky et al, 2016) 95.1 91.4 LSTM 98.3 89.4 EURNN (Jing et al,2016) -93.7 TCN (Bai et al, 2018) 99.0 97.2 r-LSTM (Trinh et al, 2018) 98.4 95.2 IndRNN (Li et al, 2018) 99.0 96.0 KRU (Jose et al,2017) 96.4 94.5 Dilated GRU (Chang et al, 2018) We note that while IGLOO and CuDNN LSTM run at similar speed of 30 seconds per epoch, the LSTM is much slower and takes about 540 seconds per epoch for a 128 hidden layers cell. Therefore we achieve superior accuracy for the pMNIST benchmark with speed levels (per epoch) similar to the fast NVIDIA optimized CuDNN LSTM cell.…”

Section: Sequential Mnist and Permuted Mnistmentioning

confidence: 99%

“…Dealing with very long term dependencies is a current area of research and recent papers have introduced new variations which aim at fixing this issue and improve on the historical models: IndRNN (Shai et al, 2018), RNN with auxiliary losses (Trinh et al, 2018). Earlier works also include the uRNN (Arjovsky et al, 2016), Quasi-Recurrent Neural Networks (Q-RNN) (Bradbury et al, 2016), Dilated RNN (Chang et al, 2017), Recurrent additive networks (Lee et al, 2017), ChronoNet (Roy et al, 2018), EUNN (Jing et al, 2016), Kronecker Recurrent Units -KRU (Jose et al, 2017) and Recurrent Weight Average (J. Ostmeyer et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

IGLOO: Slicing the Features Space to Represent Sequences

Sourkov

2018

Preprint

View full text Add to dashboard Cite

Up until recently Recurrent neural networks (RNNs) have been the standard go-to component when processing sequential data with neural networks. Issues relative to vanishing gradient have been partly addressed by Long short-term memory (LSTM) and gated recurrent unit (GRU), but in practice experiments show that very long terms dependencies (beyond 1000 time steps) are difficult to learn. We introduce IGLOO, a new neural network architecture which aims at being faster than both LSTM and GRU but also their respective CuDNN optimized versions when convergence happens and provide an alternative in the case when there is no convergence at all. IGLOO's core idea is to use the relationships between patches sliced out of the features maps of convolution layers at different levels of granularity to build a representation for the sequence. We show that the model can deal with dependencies of more than 25,000 steps in a reasonable time frame. Beyond the well bench-marked copy-memory and addition problems which show good results, we also achieve best recorded accuracy on permuted MNIST (98.4%). IGLOO is also applied on the IMDB set and on the TUH Abnormal EEG Corpus.

show abstract

“…Unitary recurrent neural networks [36,37,38] refine vanilla RNNs by parametrizing their transition matrix to be unitary. These networks are reversible in exact arithmetic [36]: the conjugate transpose of the transition matrix is its inverse, so the hidden-to-hidden transition is reversible.…”

Section: Related Workmentioning

confidence: 99%

Reversible Recurrent Neural Networks

MacKay,

Vicol,

et al. 2018

Preprint

View full text Add to dashboard Cite

Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs-RNNs for which the hidden-to-hidden transition can be reversed-offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation. We first show that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited because they cannot forget information from their hidden state. We then provide a scheme for storing a small number of bits in order to allow perfect reversal with forgetting. Our method achieves comparable performance to traditional models while reducing the activation memory cost by a factor of 10-15. We extend our technique to attention-based sequence-to-sequence models, where it maintains performance while reducing activation memory cost by a factor of 5-10 in the encoder, and a factor of 10-15 in the decoder.32nd Conference on Neural Information Processing Systems (NIPS 2018),

show abstract

Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs

Cited by 36 publications

References 10 publications

Differentiate Everything with a Reversible Embeded Domain-Specific Language

Differentiate Everything with a Reversible Embeded Domain-Specific Language

IGLOO: Slicing the Features Space to Represent Sequences

Reversible Recurrent Neural Networks

Contact Info

Product

Resources

About