2019
DOI: 10.1007/s11390-019-1923-y
|View full text |Cite
|
Sign up to set email alerts
|

A Large Chinese Text Dataset in the Wild

Abstract: We introduce Chinese Text in the Wild, a very large dataset of Chinese text in street view images. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, detection and recognition of text in natural images is still a challenging problem, especially for more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data.In this paper we provid… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
39
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 92 publications
(39 citation statements)
references
References 35 publications
0
39
0
Order By: Relevance
“…For Chinese text, Liu et al [26] first introduced a dataset for the online and offline handwritten recognition. For Chinese text in the wild, MSRA-TD500 [42], RCTW-17 [36] and CTW [43] have been released to evaluate the performance of Chinese text reading models. Unlike all the previous datasets which only provide fully annotated images, the proposed C-SVT dataset also introduces a large amount of weakly annotated images with only the text labels in regions-of-interest, which are much easier to collect and have the potential to further improve the performance of text reading models.…”
Section: Related Work 21 Text Reading Benchmarksmentioning
confidence: 99%
See 3 more Smart Citations
“…For Chinese text, Liu et al [26] first introduced a dataset for the online and offline handwritten recognition. For Chinese text in the wild, MSRA-TD500 [42], RCTW-17 [36] and CTW [43] have been released to evaluate the performance of Chinese text reading models. Unlike all the previous datasets which only provide fully annotated images, the proposed C-SVT dataset also introduces a large amount of weakly annotated images with only the text labels in regions-of-interest, which are much easier to collect and have the potential to further improve the performance of text reading models.…”
Section: Related Work 21 Text Reading Benchmarksmentioning
confidence: 99%
“…Unlike all the previous datasets which only provide fully annotated images, the proposed C-SVT dataset also introduces a large amount of weakly annotated images with only the text labels in regions-of-interest, which are much easier to collect and have the potential to further improve the performance of text reading models. C-SVT is at least 14 times as large as the previous Chinese benchmarks [36,43], making it the largest dataset for reading Chinese text in the wild.…”
Section: Related Work 21 Text Reading Benchmarksmentioning
confidence: 99%
See 2 more Smart Citations
“…Recent powerful deep learning models contributed dramatically to the advances of robust text reading problems, including text detection, recognition and end-to-end text spotting. Benefiting from the pioneer work of the existing benchmarks [1], [2], [3], [4], [5], [6], [7], [8], [9], remarkable success has been achieved in text detection and recognition in the wild. Since most of the scene text datasets provide fully annotated ground truth (i.e.…”
Section: Introductionmentioning
confidence: 99%