2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) 2019
DOI: 10.1109/icdarw.2019.40084
|View full text |Cite
|
Sign up to set email alerts
|

ReELFA: A Scene Text Recognizer with Encoded Location and Focused Attention

Abstract: LSTM and attention mechanism have been widely used for scene text recognition. However, existing LSTM-based recognizers usually convert 2D feature maps into 1D space by flattening or pooling operations, resulting in the neglect of spatial information of text images. Additionally, the attention drift problem, where models fail to align targets at proper feature regions, has a serious impact on the recognition performance of existing models. To tackle the above problems, in this paper, we propose a scene text Re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 20 publications
(46 reference statements)
0
3
0
Order By: Relevance
“…Motivated by the successes of [180], [181], [68], Su et al [131] used the histogram of oriented gradients (HOG) feature [182] in their STR system to construct sequential features of word images. Later, CNNs [138], [140], [141], [143], [72] have been widely used for feature representation stage, such as the VGGNet [183], [116], [178], [133], [71], [137], [162]. For more powerful feature representation, some complex neural networks were applied in STR algorithms, such as ResNet [184] [135], [145], [148], [73], [151], [166], [152], [154], [48], [155], [157], [158], [161], [163], [164] and DenseNet [185], [150], [160].…”
Section: Feature Representation Stagementioning
confidence: 99%
See 2 more Smart Citations
“…Motivated by the successes of [180], [181], [68], Su et al [131] used the histogram of oriented gradients (HOG) feature [182] in their STR system to construct sequential features of word images. Later, CNNs [138], [140], [141], [143], [72] have been widely used for feature representation stage, such as the VGGNet [183], [116], [178], [133], [71], [137], [162]. For more powerful feature representation, some complex neural networks were applied in STR algorithms, such as ResNet [184] [135], [145], [148], [73], [151], [166], [152], [154], [48], [155], [157], [158], [161], [163], [164] and DenseNet [185], [150], [160].…”
Section: Feature Representation Stagementioning
confidence: 99%
“…Multiple bidirectional long short term memory (BiLSTM) model was introduced in [188] and widely used in [131], [116], [178], [135], [136], [139], [140], [141], [148], [150], [73], [72], [151], [154], [48], [155], [157], [158], [162], [166] as the sequence modeling module because of its ability to capture long-range dependencies. Litman et al [169] added intermediate supervisions along the network layers and successfully trained a deeper BiLSTM model to improve the encoding of contextual dependencies.…”
Section: Sequence Modeling Stagementioning
confidence: 99%
See 1 more Smart Citation