A Large Chinese Text Dataset in the Wild

Yuan, Tailing; Zhu, Zhe; Xu, Kun; Cheng-jun, LI; Mu, Tai‐Jiang; Hu, Shi‐Min

doi:10.1007/s11390-019-1923-y

Cited by 92 publications

(39 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For Chinese text, Liu et al [26] first introduced a dataset for the online and offline handwritten recognition. For Chinese text in the wild, MSRA-TD500 [42], RCTW-17 [36] and CTW [43] have been released to evaluate the performance of Chinese text reading models. Unlike all the previous datasets which only provide fully annotated images, the proposed C-SVT dataset also introduces a large amount of weakly annotated images with only the text labels in regions-of-interest, which are much easier to collect and have the potential to further improve the performance of text reading models.…”

Section: Related Work 21 Text Reading Benchmarksmentioning

confidence: 99%

“…Unlike all the previous datasets which only provide fully annotated images, the proposed C-SVT dataset also introduces a large amount of weakly annotated images with only the text labels in regions-of-interest, which are much easier to collect and have the potential to further improve the performance of text reading models. C-SVT is at least 14 times as large as the previous Chinese benchmarks [36,43], making it the largest dataset for reading Chinese text in the wild.…”

Section: Related Work 21 Text Reading Benchmarksmentioning

confidence: 99%

“…Previous end-to-end trainable text reading models [40,14,22,28,29] only utilize images in full annotations provided by the previous benchmarks [36,3,43]. The improvement in performance of these models requires more fully annotated training data, which is extremely expensive and inefficient in annotations.…”

Section: Partially Supervised Learningmentioning

confidence: 99%

“…Since the category number of Chinese characters in real-world images is much larger than those of Latin languages, the number of training samples of most current datasets is still limited per category and the distribution of characters is relatively unbalanced. Therefore, reading Chinese text in the wild requires more well annotated training samples, however, it is difficult for the existing benchmarks [36] [43] to satisfy the requirements mainly due to the high cost of data collections and location annotations of text regions.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning

Sun

Liu

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Most existing text reading benchmarks make it difficult to evaluate the performance of more advanced deep learning models in large vocabularies due to the limited amount of training data. To address this issue, we introduce a new large-scale text reading benchmark dataset named Chinese Street View Text (C-SVT) with 430, 000 street view images, which is at least 14 times as large as the existing Chinese text reading benchmarks. To recognize Chinese text in the wild while keeping large-scale datasets labeling cost-effective, we propose to annotate one part of the C-SVT dataset (30,000 images) in locations and text labels as full annotations and add 400, 000 more images, where only the corresponding text-of-interest in the regions is given as weak annotations. To exploit the rich information from the weakly annotated data, we design a text reading network in a partially supervised learning framework, which enables to localize and recognize text, learn from fully and weakly annotated data simultaneously. To localize the best matched text proposals from weakly labeled images, we propose an online proposal matching module incorporated in the whole model, spotting the keyword regions by sharing parameters for end-to-end training. Compared with fully supervised training algorithms, this model can improve the end-to-end recognition performance remarkably by 4.03% in F-score at the same labeling cost. The proposed model can also achieve state-of-the-art results on the ICDAR 2017-RCTW dataset, which demonstrates the effectiveness of the proposed partially supervised learning framework.

show abstract

Section: Related Work 21 Text Reading Benchmarksmentioning

confidence: 99%

Section: Related Work 21 Text Reading Benchmarksmentioning

confidence: 99%

Section: Partially Supervised Learningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning

Sun

Liu

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…Recent powerful deep learning models contributed dramatically to the advances of robust text reading problems, including text detection, recognition and end-to-end text spotting. Benefiting from the pioneer work of the existing benchmarks [1], [2], [3], [4], [5], [6], [7], [8], [9], remarkable success has been achieved in text detection and recognition in the wild. Since most of the scene text datasets provide fully annotated ground truth (i.e.…”

Section: Introductionmentioning

confidence: 99%

ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT

Sun

Karatzas

Chan

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 50, 000 and 400, 000 images in full and weak annotations, respectively. This competition aims to explore the abilities of state-of-the-art methods to detect and recognize text instances from large-scale street view images, closing the gap between research benchmarks and real applications. During the competition period, a total of 41 teams participated in the two proposed tasks with 132 valid submissions, i.e., text detection and end-to-end text spotting. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2019-LSVT challenge.

show abstract

From object detection to text detection and recognition: A brief evolution history of optical character recognition

Wang

Pan

Guo

et al. 2021

WIREs Computational Stats

View full text Add to dashboard Cite

Text detection and recognition, which is also known as optical character recognition (OCR), is an active research area under quick development with a lot of exciting applications. Deep‐learning‐based methods represent the state‐of‐art of this area. However, these methods are largely deterministic: they give a deterministic output for each input. For both statisticians and general users, methods supporting uncertainty inference are of great appeal, leaving rich research opportunities to incorporate statistical models and methods with the established deep‐learning‐based approaches. In this paper, we provide a comprehensive review of the evolution history of research development on OCR with discussions on the statistical insights behind these developments and potential directions to enhance the current methods with statistical approaches. We hope this article can serve as a useful guidebook for statisticians who are seeking for a path toward edge‐cutting research in this exciting area. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Deep Learning Data: Types and Structure > Image and Spatial Data

show abstract

A Large Chinese Text Dataset in the Wild

Cited by 92 publications

References 35 publications

Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning

Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning

ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT

From object detection to text detection and recognition: A brief evolution history of optical character recognition

Contact Info

Product

Resources

About