2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.01198
|View full text |Cite
|
Sign up to set email alerts
|

SCATTER: Selective Context Attentional Scene Text Recognizer

Abstract: Scene Text Recognition (STR), the task of recognizing text against complex image backgrounds, is an active area of research. Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes. In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER). SCATTER utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus impro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
82
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 128 publications
(82 citation statements)
references
References 34 publications
0
82
0
Order By: Relevance
“…Un-and semi-supervised learning for text recognition Despite clear advantages, currently, most text recognition methods do not utilize unlabeled real-world text images. Specifically, handwritten recognition usually relies on fullysupervised training [64,59], while scene text models are trained mostly on synthetic data [3,38]. That said, [68] and [34] have recently suggested domain adaptation techniques to utilize an unlabeled dataset along with labeled data.…”
Section: Related Workmentioning
confidence: 99%
“…Un-and semi-supervised learning for text recognition Despite clear advantages, currently, most text recognition methods do not utilize unlabeled real-world text images. Specifically, handwritten recognition usually relies on fullysupervised training [64,59], while scene text models are trained mostly on synthetic data [3,38]. That said, [68] and [34] have recently suggested domain adaptation techniques to utilize an unlabeled dataset along with labeled data.…”
Section: Related Workmentioning
confidence: 99%
“…Similar to [18], many works [19]- [21] also introduce BiLSTM into the STR framework. Litman et al [22] propose to apply a deeper BiLSTM model to improve the encoding of contextual and deploy intermediate supervisions along the network layers. Different from [22], some works [23]- [25] do not use BiLSTM structure, and consider that BiLSTM is computationally intensive and time consuming.…”
Section: Sequence Modelingmentioning
confidence: 99%
“…Litman et al [22] propose to apply a deeper BiLSTM model to improve the encoding of contextual and deploy intermediate supervisions along the network layers. Different from [22], some works [23]- [25] do not use BiLSTM structure, and consider that BiLSTM is computationally intensive and time consuming. Yin et al [23] propose to simultaneously detect and recognize characters by sliding the text line image with character models, which are learned end-to-end on text line images labeled with text transcripts.…”
Section: Sequence Modelingmentioning
confidence: 99%
“…Zhan et al [39] improved the recognition performance by iteratively correcting the text image. Litman et al [17] improved the performance of recognition by overlaying the RNN decoder several times. They demonstrated the effectiveness of the superimposed module.…”
Section: Irregular Text Recognitionmentioning
confidence: 99%