2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7472159
|View full text |Cite
|
Sign up to set email alerts
|

Batch normalized recurrent neural networks

Abstract: Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies. However, they are computationally expensive to train and difficult to parallelize. Recent work has shown that normalizing intermediate representations of neural networks can significantly improve convergence rates in feedforward neural networks [1]. In particular, batch normalization, which uses mini-batch statistics to standardize features, was shown to significantly reduce trainin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
113
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 170 publications
(116 citation statements)
references
References 14 publications
1
113
1
Order By: Relevance
“…That system only contains the deep clustering part, which corresponds to α = 1 in the hybrid system. In the MIREX system, dropout layers with probability 0.2 were added between each feed-forward connection, and sequence-wise batch normalization [20] was applied in the input-to-hidden transformation in each BLSTM layer. Similarly to [13], we also applied a curriculum learning strategy [21], where we first train the network on segments of 100 frames, then train on segments of 500 frames.…”
Section: Evaluation and Discussionmentioning
confidence: 99%
“…That system only contains the deep clustering part, which corresponds to α = 1 in the hybrid system. In the MIREX system, dropout layers with probability 0.2 were added between each feed-forward connection, and sequence-wise batch normalization [20] was applied in the input-to-hidden transformation in each BLSTM layer. Similarly to [13], we also applied a curriculum learning strategy [21], where we first train the network on segments of 100 frames, then train on segments of 500 frames.…”
Section: Evaluation and Discussionmentioning
confidence: 99%
“…In [33], authors suggest to apply it to feed-forward connections only, while in [34] the normalization step is extended to recurrent connections, using separate statistics for each time-step. In this work, we tried both approaches and we observed a comparable performance between them.…”
Section: Batch Normalizationmentioning
confidence: 99%
“…HARDWARE ACCELERATED QUANTIZED LSTM There have been many research efforts from the algorithm point of view on binarizing or quantizing the feedforward neural networks like CNN and MLP [23], [24]. Binarizing LSTM is more challenging than binarizing the CNN or MLP as it is difficult to adopt the back-end techniques like batch normalization in a recurrent neural network [25]. Instead, quantized LSTM has been studied and it is revealed that low quantization bit-widths can be achieved by quantizing the weights and hidden state during forward propagation and using straight-through estimator (STE) to propagate the gradient for weight update [26], [27].…”
Section: Vector-matrix Multiplication Accelerated By Nvm Arraymentioning
confidence: 99%