2016 23rd International Conference on Pattern Recognition (ICPR) 2016
DOI: 10.1109/icpr.2016.7900183
|View full text |Cite
|
Sign up to set email alerts
|

Faster training of very deep networks via p-norm gates

Abstract: Abstract-A major contributing factor to the recent advances in deep neural networks is structural units that let sensory information and gradients to propagate easily. Gating is one such structure that acts as a flow control. Gates are employed in many recent state-of-the-art recurrent models such as LSTM and GRU, and feedforward models such as Residual Nets and Highway Networks. This enables learning in very deep networks with hundred layers and helps achieve record-breaking results in vision (e.g., ImageNet … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 15 publications
(16 citation statements)
references
References 17 publications
0
16
0
Order By: Relevance
“…Figure 2 shows the Long-Deep Recurrent Neural Network (LD-RNN) that we have designed for the story point prediction system. It is composed of four components arranged sequentially: (i) word embedding, (ii) document representation using Long Short-Term Memory (LSTM) [34], (iii) deep representation using Recurrent Highway Net (RHWN) [35]; and (iv) differentiable regression. Given a document which consists of a sequence of words s = (w 1 , w 2 , ..., w n ), e.g.…”
Section: Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…Figure 2 shows the Long-Deep Recurrent Neural Network (LD-RNN) that we have designed for the story point prediction system. It is composed of four components arranged sequentially: (i) word embedding, (ii) document representation using Long Short-Term Memory (LSTM) [34], (iii) deep representation using Recurrent Highway Net (RHWN) [35]; and (iv) differentiable regression. Given a document which consists of a sequence of words s = (w 1 , w 2 , ..., w n ), e.g.…”
Section: Approachmentioning
confidence: 99%
“…This gating scheme is highly effective: while traditional deep neural nets cannot go beyond several layers, the Highway Net can have up to a thousand layers [41]. In previous work [35] we found that the operation in Eq. (2) can be repeated multiple times with exactly the same set of parameters.…”
Section: B Deep Representation Using Recurrent Highway Networkmentioning
confidence: 99%
“…Their model is composed of four components: (1) Word Embedding, (2) Document representation using Long-Short Term Memory (LSTM) [22], [23], (3) Deep representation using Recurrent Highway Network (RHWN) [24], and (4) Differentiable Regression.…”
Section: Deep-sementioning
confidence: 99%
“…In case of recurrent nets, the query information is also propagated through the internal state of the controller. For simplicity, in this paper, we implement the controller and the memory updates using skip-connections [33], [34]…”
Section: Recurrent Skip-connectionsmentioning
confidence: 99%