2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00702
|View full text |Cite
|
Sign up to set email alerts
|

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Abstract: Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
151
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 188 publications
(157 citation statements)
references
References 45 publications
0
151
0
Order By: Relevance
“…Also, some works [21,58,82] attempt to construct text recognizers based on Transformer [64], which is thrived in the field of natural language processing, to robustly learn the representations for text images through self-attention modules. Recently, some researchers incorporated semantic knowledge into text recognizers to fully exploit the external language priors [20,54,75,88]. For example, SEED [54] utilizes text embedding guided by fastText [8] to initialize the attention-based decoder.…”
Section: Existing Text Recognition Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Also, some works [21,58,82] attempt to construct text recognizers based on Transformer [64], which is thrived in the field of natural language processing, to robustly learn the representations for text images through self-attention modules. Recently, some researchers incorporated semantic knowledge into text recognizers to fully exploit the external language priors [20,54,75,88]. For example, SEED [54] utilizes text embedding guided by fastText [8] to initialize the attention-based decoder.…”
Section: Existing Text Recognition Methodsmentioning
confidence: 99%
“…For example, SEED [54] utilizes text embedding guided by fastText [8] to initialize the attention-based decoder. ABINet [20] designs two autonomous branches to iteratively optimize the vision and language models. Overall, these general text recognition methods can be easily transferred to Chinese scenarios by simply replacing the alphabet or language priors.…”
Section: Existing Text Recognition Methodsmentioning
confidence: 99%
“…More specifically, they obtain OCR annotations from an open source OCR engine Tesseract [45] for 5 Million documents from IIT-CDIP [25] dataset. With the introduction of pre-training strategy and advances in modern OCR engine [1,12,20,28,34], many contemporary approaches [7,2,53] have utilized even more data to advance the Document Intelligence field.…”
Section: Introductionmentioning
confidence: 99%
“…They typically consist of a visual feature extractor, abstracting the image patch, and a character sequence generator, responsible for character decoding. Despite wide explorations to find better visual feature extractors and character sequence generators, existing methods still suffer from challenging environments: occlusion, (a) STR methods w/ an LM [6,28] (b) Semantic-MATRN and [3] (c) Visual-MATRN and [25] (d) MATRN blurs, distortions, and other artifacts [2,3].…”
Section: Introductionmentioning
confidence: 99%
“…Once identifying a seed character se-quence, a bi-directional Transformer encoder re-estimates the character at each position. Based on SRN, Fang et al [6] improve the iterative refinement stages by explicitly dividing a vision model (VM) and an LM by blocking gradient flows and employing a bi-directional LM pre-trained on unlabeled text datasets. These methods incorporating semantic knowledge of LMs provide breakthroughs in recognizing challenging examples with ambiguous visual clues.…”
Section: Introductionmentioning
confidence: 99%