2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.543
|View full text |Cite
|
Sign up to set email alerts
|

Focusing Attention: Towards Accurate Text Recognition in Natural Images

Abstract: Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reason is that existing methods cannot get accurate alignments between feature areas and targets for such images. We cal… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
424
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 461 publications
(440 citation statements)
references
References 25 publications
1
424
0
Order By: Relevance
“…This inevitably makes the models more complicated and difficult to train by requiring a significantly longer training time with a large amount of training samples. Recent work, such as [33,4,12], had shown that the performance of RNN-based methods can be improved considerably by introducing char-level attention mechanism which is able to encode strong character information implicitly or explicitly. This enables the models to have the ability to identify characters more accurately, and essentially adds additional constraints to the models which in turn reduce the search space, leading to performance boost.…”
Section: Character Branchmentioning
confidence: 99%
“…This inevitably makes the models more complicated and difficult to train by requiring a significantly longer training time with a large amount of training samples. Recent work, such as [33,4,12], had shown that the performance of RNN-based methods can be improved considerably by introducing char-level attention mechanism which is able to encode strong character information implicitly or explicitly. This enables the models to have the ability to identify characters more accurately, and essentially adds additional constraints to the models which in turn reduce the search space, leading to performance boost.…”
Section: Character Branchmentioning
confidence: 99%
“…Among them, the first group contains several well-known recognition networks, including CRNN [1] and GRCNN [5]. We then compare ours with previous attention aware approaches such as FAN [8], FCN [14], and Baek et al [16].…”
Section: Configurationmentioning
confidence: 99%
“…The attention mechanism is incorporated into the decoder. For example, Lee et al [7] proposed to use an attention-based decoder for text-output prediction, while Cheng et al [8] presented the Focusing Attention Network (FAN) to tackle attention drift problem in order to improve the performance of regular text recognition. Besides, some previous work also exploited to handle the irregular scene text images at the beginning of the encoder.…”
Section: Introductionmentioning
confidence: 99%
“…Generally, in the encoding stage, the convolutional neural networks (CNN) are used to extract features from the input image, whereas in the decoding stage, the encoded feature vectors are transcribed into target strings by exploiting the recurrent neural network (RNN) [12], [13], (c) The recurrent decoding process of attention-based decoder with AEG i a n a connectionist temporal classification (CTC) [14] or attention mechanism [15]. In particular, the attention-based approaches [4], [11], [16], [17], [18] often achieve better performance owing to the focus on informative areas.…”
Section: Introductionmentioning
confidence: 99%