SCATTER: Selective Context Attentional Scene Text Recognizer

Litman, Ron; Anschel, Oron; Tsiper, Shahar; Litman, Roee; Mazor, Shai; Manmatha, R.

doi:10.1109/cvpr42600.2020.01198

Cited by 128 publications

(82 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Un-and semi-supervised learning for text recognition Despite clear advantages, currently, most text recognition methods do not utilize unlabeled real-world text images. Specifically, handwritten recognition usually relies on fullysupervised training [64,59], while scene text models are trained mostly on synthetic data [3,38]. That said, [68] and [34] have recently suggested domain adaptation techniques to utilize an unlabeled dataset along with labeled data.…”

Section: Related Workmentioning

confidence: 99%

Sequence-to-Sequence Contrastive Learning for Text Recognition

Aberdam

Litman²,

Tsiper³

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Sequence-to-Sequence Contrastive Learning for Text Recognition

Aberdam

Litman²,

Tsiper³

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

View full text Add to dashboard Cite

“…Similar to [18], many works [19]- [21] also introduce BiLSTM into the STR framework. Litman et al [22] propose to apply a deeper BiLSTM model to improve the encoding of contextual and deploy intermediate supervisions along the network layers. Different from [22], some works [23]- [25] do not use BiLSTM structure, and consider that BiLSTM is computationally intensive and time consuming.…”

Section: Sequence Modelingmentioning

confidence: 99%

“…Litman et al [22] propose to apply a deeper BiLSTM model to improve the encoding of contextual and deploy intermediate supervisions along the network layers. Different from [22], some works [23]- [25] do not use BiLSTM structure, and consider that BiLSTM is computationally intensive and time consuming. Yin et al [23] propose to simultaneously detect and recognize characters by sliding the text line image with character models, which are learned end-to-end on text line images labeled with text transcripts.…”

Section: Sequence Modelingmentioning

confidence: 99%

Random Blur Data Augmentation for Scene Text Recognition

Mu¹,

Sun

et al. 2021

IEEE Access

View full text Add to dashboard Cite

In this paper, we propose to apply data augmentation approaches that provide more diverse training images, thus helping train more robust deep models for the Scene Text Recognition (STR) task. The data augmentation methods are Random Blur Region (RBR) and Random Blur Units (RBUs). Specifically, we first introduce RBR designed for the STR task. In training, RBR randomly selects a region and sets the pixels in this region with an average value. However, when RBR provides more various training samples for STR, it may make the samples ambiguous and reduce the recognition accuracy. To address the above problem, we also propose RBUs that divides the blur region into several units. Note that the pixels of one unit share the same value. In this way, RBUs can provide additional readable training samples and help train more robust deep models. Extensive experiments on several STR datasets show that RBUs achieve highly competitive performance. Besides, RBUs are complementary to commonly used data augmentation techniques.INDEX TERMS deep learning, computer vision, text recognition, random blurring.

show abstract

“…Zhan et al [39] improved the recognition performance by iteratively correcting the text image. Litman et al [17] improved the performance of recognition by overlaying the RNN decoder several times. They demonstrated the effectiveness of the superimposed module.…”

Section: Irregular Text Recognitionmentioning

confidence: 99%

NASTER: Non-local Attentional Scene Text Recognizer

Liu

Hao

et al. 2021

Proceedings of the 2021 International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

Scene text recognition has been widely investigated in computer vision. In the literature, the encoder-decoder based framework, which first encodes image into feature map and then decodes them into corresponding text sequences, have achieved great success. However, this solution fails in low-quality images, as the local visual features extracted from curved or blurred images are difficult to decode into corresponding text. To address this issue, we propose a new framework for Scene Text Recognition (STR), named Non-Local Attentional Scene Text Recognizer (NASTER). We use ResNet with Global Context Block (GC block) to extract global visual features. The global context information is then captured in parallel using the self-attention module and finally decoded by a multi-layer attention decoder with an intermediate supervision module. The proposed method achieves the state-of-the-art performances on seven benchmark datasets, demonstrating the effectiveness of our approach.

show abstract

SCATTER: Selective Context Attentional Scene Text Recognizer

Cited by 128 publications

References 34 publications

Sequence-to-Sequence Contrastive Learning for Text Recognition

Sequence-to-Sequence Contrastive Learning for Text Recognition

Random Blur Data Augmentation for Scene Text Recognition

NASTER: Non-local Attentional Scene Text Recognizer

Contact Info

Product

Resources

About