2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00163
|View full text |Cite
|
Sign up to set email alerts
|

Edit Probability for Scene Text Recognition

Abstract: We consider the scene text recognition problem under the attention-based encoder-decoder framework, which is the state of the art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize the models. When we train the model, the misalignment between the ground truth strings and the attention's output sequences of probability distribution, which is caused by missing or superfluous characters, will confuse and mislead the training process, and consequently make the training costly and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
115
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 171 publications
(115 citation statements)
references
References 28 publications
0
115
0
Order By: Relevance
“…Shi et al [31] improved such CNN+RNN+CTC framework by making it end-toend trainable, with significant performance gain obtained. Recently, the framework was further improved by introducing various attention mechanisms, which are able to encode more character information explicitly or implicitly [33,4,1,12].…”
Section: Related Workmentioning
confidence: 99%
“…Shi et al [31] improved such CNN+RNN+CTC framework by making it end-toend trainable, with significant performance gain obtained. Recently, the framework was further improved by introducing various attention mechanisms, which are able to encode more character information explicitly or implicitly [33,4,1,12].…”
Section: Related Workmentioning
confidence: 99%
“…It is observed that the performance of attention and CTC on all the datasets degrades as the shuffle ratio increases. Specifically, attention is more sensitive than CTC because misalignment problem can easily misleads the training process of attention [3]. In contrast, the proposed ACE loss function exhibits similar recognition results for all the settings of the shuffle ratio, this is because it only requires classes and their number for supervision, completely omitting character order information.…”
Section: Resultsmentioning
confidence: 96%
“…The attention mechanism was first proposed in machine translation [1,42] to enable a model to automatically search for parts of a source sentence for prediction. Then, the method rapidly became popular in applications such as (visual) question answering [32,52], image caption generation [50,52,31], speech recognition [2,25,32] and scene text recognition [39,3,19]. Most importantly, the attention mechanism can also be applied to 2D predictions, such as mathematical expression recognition [56,57] and paragraph recognition [4,5,46].…”
Section: Attention Mechanismmentioning
confidence: 99%
See 1 more Smart Citation
“…Generally, in the encoding stage, the convolutional neural networks (CNN) are used to extract features from the input image, whereas in the decoding stage, the encoded feature vectors are transcribed into target strings by exploiting the recurrent neural network (RNN) [12], [13], (c) The recurrent decoding process of attention-based decoder with AEG i a n a connectionist temporal classification (CTC) [14] or attention mechanism [15]. In particular, the attention-based approaches [4], [11], [16], [17], [18] often achieve better performance owing to the focus on informative areas.…”
Section: Introductionmentioning
confidence: 99%