2021
DOI: 10.48550/arxiv.2112.15093
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

Abstract: The flourishing blossom of deep learning has witnessed the rapid development of text recognition in recent years. However, the existing text recognition methods are mainly for English texts, whereas ignoring the pivotal role of Chinese texts. As another widely-spoken language, Chinese text recognition in all ways has extensive application markets. Based on our observations, we attribute the scarce attention on Chinese text recognition to the lack of reasonable dataset construction standards, unified evaluation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(15 citation statements)
references
References 80 publications
0
15
0
Order By: Relevance
“…The Sequence-to-Sequence models (Zhang et al 2020b;Wang et al 2019;Sheng, Chen, and Xu 2019;Bleeker and de Rijke 2019;Lee et al 2020;Atienza 2021;Chen et al 2021) are gradually attracting more attention, especially after the advent of the Transformer architecture (Vaswani et al 2017). SaHAN (Zhang et al 2020b), standing for the scaleaware hierarchical attention network, are proposed to address the character scale-variation issue.…”
Section: Related Work Scene Text Recognitionmentioning
confidence: 99%
“…The Sequence-to-Sequence models (Zhang et al 2020b;Wang et al 2019;Sheng, Chen, and Xu 2019;Bleeker and de Rijke 2019;Lee et al 2020;Atienza 2021;Chen et al 2021) are gradually attracting more attention, especially after the advent of the Transformer architecture (Vaswani et al 2017). SaHAN (Zhang et al 2020b), standing for the scaleaware hierarchical attention network, are proposed to address the character scale-variation issue.…”
Section: Related Work Scene Text Recognitionmentioning
confidence: 99%
“…Different from previous methods, we only utilize text labels instead of pixel-wise labels. The bi-lingual text-related benchmark [18] can also be used for training the recognition module. Implementation Details The size of the input is set to 48×160 and the number of channels for the fused feature is set to 512.…”
Section: Datasets and Implementation Details Datasetsmentioning
confidence: 99%
“…Scene text recognition (STR) [33], [1], [2], [34], [35] has made great progress in recent years. Specifically, CRNN [36] takes CNN and RNN as the encoder and employs a CTCbased [37] decoder to maximize the probabilities of paths that can reach the ground truth.…”
Section: Scene Text Recognitionmentioning
confidence: 99%