RNNDROP: A novel dropout for RNNS in ASR

Moon, Taesup; Choi, Heeyoul; Lee, Hoshik; Song, Inchul

doi:10.1109/asru.2015.7404775

Cited by 74 publications

(46 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The third row shows the improvement achieved when adding recurrent dropout. Similarly to [40,41], we applied the same dropout mask for all the time steps to avoid gradient vanishing problems. The fourth line, instead, shows the benefits derived from batch normalization [18].…”

Section: Baselinesmentioning

confidence: 99%

The Pytorch-kaldi Speech Recognition Toolkit

Ravanelli¹,

Parcollet²,

Bengio³

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

217

135

View full text Add to dashboard Cite

The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility.The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters.Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.

show abstract

Section: Baselinesmentioning

confidence: 99%

The Pytorch-kaldi Speech Recognition Toolkit

Ravanelli¹,

Parcollet²,

Bengio³

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

217

135

View full text Add to dashboard Cite

show abstract

“…We use Adam [14] for optimization with a learning rate of 1 × 10 − 3 and set β 1 = 0.9, β 2 = 0.99. RnnDrop [24,7] are used in recurrent layers to prevent overfitting.…”

Section: Network Architecturementioning

confidence: 99%

Recurrent Autoregressive Networks for Online Multi-object Tracking

Fang

Xiang

et al. 2018

2018 IEEE Winter Conference on Applications of Computer Vision (WACV)

248

110

View full text Add to dashboard Cite

The main challenge of online multi-object tracking is to reliably associate object trajectories with detections in each video frame based on their tracking history. In this work, we propose the Recurrent Autoregressive Network (RAN), a temporal generative modeling framework to characterize the appearance and motion dynamics of multiple objects over time. The RAN couples an external memory and an internal memory. The external memory explicitly stores previous inputs of each trajectory in a time window, while the internal memory learns to summarize long-term tracking history and associate detections by processing the external memory. We conduct experiments on the MOT 2015 and 2016 datasets to demonstrate the robustness of our tracking method in highly crowded and occluded scenes. Our method achieves top-ranked results on the two benchmarks. Recurrent CellRecurrent Cell

show abstract

“…In addition to feed-forward layers, dropout can be applied to the convolutional or the recurrent layers. To preserve the spatial or temporal structure while dropping out random nodes, spatial dropout [20] and RnnDrop [21] were proposed for the convolutional and the recurrent layers, respectively. There are several papers that explain how dropout improves the performance [10,13,14], assuming that dropout avoids the co-adaptation problem without any question on it.…”

Section: Dropoutmentioning

confidence: 99%

Understanding dropout as an optimization trick

Hahn

Choi

2020

Neurocomputing

Self Cite

View full text Add to dashboard Cite

As one of standard approaches to train deep neural networks, dropout has been applied to regularize large models to avoid overfitting, and the improvement in performance by dropout has been explained as avoiding co-adaptation between nodes. However, when correlations between nodes are compared after training the networks with or without dropout, one question arises whether or not dropout really avoids co-adaptation. In this paper, we propose a new explanation of why dropout works and propose a new technique to design better activation functions. First, we show that dropout can be explained as an optimization technique to push the input towards the saturation area of nonlinear activation function by accelerating gradient information flowing even in the saturation area in backpropagation. Based on this explanation, we propose a new technique for activation functions, gradient acceleration in activation function (GAAF), that accelerates gradients to flow even in the saturation area. Then, input to the activation function can climb onto the saturation area which makes the network more robust because the model converges on a flat region. Experiment results support our explanation of dropout and confirm that the proposed GAAF technique improves image classification performance with expected properties.

show abstract

RNNDROP: A novel dropout for RNNS in ASR

Cited by 74 publications

References 14 publications

The Pytorch-kaldi Speech Recognition Toolkit

The Pytorch-kaldi Speech Recognition Toolkit

Recurrent Autoregressive Networks for Online Multi-object Tracking

Understanding dropout as an optimization trick

Contact Info

Product

Resources

About