2019
DOI: 10.1007/s10032-019-00325-0
|View full text |Cite
|
Sign up to set email alerts
|

Are 2D-LSTM really dead for offline text recognition?

Abstract: There is a recent trend in handwritten text recognition with deep neural networks to replace 2D recurrent layers with 1D, and in some cases even completely remove the recurrent layers, relying on simple feed-forward convolutional only architectures. The most used type of recurrent layer is the Long-Short Term Memory (LSTM). The motivations to do so are many:there are few open-source implementations of 2D-LSTM, even fewer supporting GPU implementations (currently cuDNN only implements 1D-LSTM); 2D recurrences r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(16 citation statements)
references
References 27 publications
0
12
0
Order By: Relevance
“…Training time Prediction time Parameters (min/epoch) (ms/sample) 2D-LSTM [6] 0.8 M 2D-LSTM-X2 [6] 3.3 M CNN + 1D-LSTM [5,6] 11 As we can see among the best models, ours is the one with the lowest number of parameters. Training time and prediction time of the CNN + 1D-LSTM [5,6] and of our model are in the same order of magnitude. This can be explained by the high number of normalization layers used in our model and its depth that counterbalance with the sequential computations of the LSTM layers.…”
Section: Architecturementioning
confidence: 96%
See 2 more Smart Citations
“…Training time Prediction time Parameters (min/epoch) (ms/sample) 2D-LSTM [6] 0.8 M 2D-LSTM-X2 [6] 3.3 M CNN + 1D-LSTM [5,6] 11 As we can see among the best models, ours is the one with the lowest number of parameters. Training time and prediction time of the CNN + 1D-LSTM [5,6] and of our model are in the same order of magnitude. This can be explained by the high number of normalization layers used in our model and its depth that counterbalance with the sequential computations of the LSTM layers.…”
Section: Architecturementioning
confidence: 96%
“…In [5], a 1D-LSTM reaches better results than a 2D-LSTM with less training time but more parameters. More recently, a 2D-LSTM presented in [6] showed competitive prediction time and performance over many datasets such as RIMES [7], IAM [8] and more complex ones like MAURDOR [9].…”
Section: A Recurrent Neural Network (Rnn)mentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, there has been a trend in handwritten text recognition with deep neural networks to replace 2D recurrent layers with 1D, and in some cases even completely remove the recurrent layers, relying on simple feed-forward convolutional only architectures. A more detailed discussion of that can be found in the 2018 paper of Moysset and Messina (2017). On the other hand, those two authors show that 2D-LSTM networks still seem to provide the highest performances.…”
Section: Literature Reviewmentioning
confidence: 99%
“…These models are configured to minimize the connectionist temporal classification (CTC) cost function proposed by Graves in [17]. In some works 2D-LSTM [16] networks are used [18]- [21]. This RNN has two main drawbacks.…”
Section: Related Work and Contributions A Dnn Modelmentioning
confidence: 99%