Proceedings of the 2021 International Conference on Multimedia Retrieval 2021
DOI: 10.1145/3460426.3463623
|View full text |Cite
|
Sign up to set email alerts
|

NASTER: Non-local Attentional Scene Text Recognizer

Abstract: Scene text recognition has been widely investigated in computer vision. In the literature, the encoder-decoder based framework, which first encodes image into feature map and then decodes them into corresponding text sequences, have achieved great success. However, this solution fails in low-quality images, as the local visual features extracted from curved or blurred images are difficult to decode into corresponding text. To address this issue, we propose a new framework for Scene Text Recognition (STR), name… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 33 publications
(44 reference statements)
0
1
0
Order By: Relevance
“…A transformer decoder is then used to decode the image features into sequences of characters. A similar model was also proposed by Wu et al [42], utilizing a transformer decoder which is preceded by a global context ResNet (GCNet). Zhang et al [50] employed a combination of CNN and RNN as the encoder and a transformer inspired cross-network attention as a part of the decoder in their cascade attention network.…”
Section: Introductionmentioning
confidence: 99%
“…A transformer decoder is then used to decode the image features into sequences of characters. A similar model was also proposed by Wu et al [42], utilizing a transformer decoder which is preceded by a global context ResNet (GCNet). Zhang et al [50] employed a combination of CNN and RNN as the encoder and a transformer inspired cross-network attention as a part of the decoder in their cascade attention network.…”
Section: Introductionmentioning
confidence: 99%