Scene Text Telescope: Text-Focused Scene Image Super-Resolution

Chen, Jingye; Li, Bin; Xue, Xiangyang

doi:10.1109/cvpr46437.2021.01185

Cited by 71 publications

(53 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides, as shown in Figure 7, the scene dataset comprises many vertical text images, which indeed put obstacles to those baselines (e.g., CRNN [59], ASTER [60], and MORAN [45]) that simply transform original input to 1-D feature sequence. By contrast, those 2-D methods (e.g., SAR [36], SRN [88] and TransOCR [10]) achieve better performance on this dataset, as 2-D feature maps are more robust to tackle text images with special layouts (e.g., vertical or curved). Further, by taking advantage of the self-attention modules, TransOCR [10] surpasses all its counterparts with recognition accuracy 63.3% as it is capable of modeling the sequential features more flexibly.…”

Section: Analysis Of Experimental Resultsmentioning

confidence: 85%

“…By contrast, those 2-D methods (e.g., SAR [36], SRN [88] and TransOCR [10]) achieve better performance on this dataset, as 2-D feature maps are more robust to tackle text images with special layouts (e.g., vertical or curved). Further, by taking advantage of the self-attention modules, TransOCR [10] surpasses all its counterparts with recognition accuracy 63.3% as it is capable of modeling the sequential features more flexibly. At last, we notice that SEED [54] does not perform well on this dataset.…”

Section: Analysis Of Experimental Resultsmentioning

confidence: 85%

“…The output pixel offsets are further used for generating the rectified image, which is further sent to the attention-based decoder (ASRN) for text recognition. ASTER [60] MORAN [45] SAR [36] SEED [54] SRN [88] TransOCR [10] SAR [36] is a representative method that takes advantage of 2-D feature maps for more robust decoding. In particular, it is mainly proposed to tackle irregular texts.…”

Section: Baselinesmentioning

confidence: 99%

“…TransOCR (Chen et al, 2021) [10] is one of the representative Transformer-based methods. It is originally designed to provide text priors for the super-resolution task.…”

Section: Baselinesmentioning

confidence: 99%

“…We adopt the off-the-shelf PyTorch implementations of CRNN [59], ASTER [60], MORAN [45], SAR [36], SEED [54], SRN [88], and TransOCR [10] on Github 7 to reproduce the experimental results on the collected Chinese text images datasets. As demonstrated in Figure 7, the distributions of aspect ratios vary across different types of datasets.…”

Section: Implementation Detailsmentioning

confidence: 99%

See 4 more Smart Citations

Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

Chen¹,

Yu²,

Ma³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The flourishing blossom of deep learning has witnessed the rapid development of text recognition in recent years. However, the existing text recognition methods are mainly for English texts, whereas ignoring the pivotal role of Chinese texts. As another widely-spoken language, Chinese text recognition in all ways has extensive application markets. Based on our observations, we attribute the scarce attention on Chinese text recognition to the lack of reasonable dataset construction standards, unified evaluation methods, and results of the existing baselines. To fill this gap, we manually collect Chinese text datasets from publicly available competitions, projects, and papers, then divide them into four categories including scene, web, document, and handwriting datasets. Furthermore, we evaluate a series of representative text recognition methods on these datasets with unified evaluation methods to provide experimental results. By analyzing the experimental results, we surprisingly observe that state-of-the-art baselines for recognizing English texts cannot perform well on Chinese scenarios. We consider that there still remain numerous challenges under exploration due to the characteristics of Chinese texts, which are quite different from English texts. The code and datasets are made publicly available at https://github.com/ FudanVI/benchmarking-chinese-text-recognition. Figure 1. Three reasons for the scarce attention of Chinese text recognition. (a) People may use different ways to crop text regions, which leads to unfair comparison. (b) It is necessary to specify the equivalence between lowercase and uppercase, half-width and full-width, simplified and traditional characters. (c) The existing methods are mainly evaluated with English datasets rather than Chinese datasets.

show abstract

Section: Analysis Of Experimental Resultsmentioning

confidence: 85%

Section: Analysis Of Experimental Resultsmentioning

confidence: 85%

Section: Baselinesmentioning

confidence: 99%

“…TransOCR (Chen et al, 2021) [10] is one of the representative Transformer-based methods. It is originally designed to provide text priors for the super-resolution task.…”

Section: Baselinesmentioning

confidence: 99%