ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT

Nayef, Nibal; Yin, Fei; Bizid, Imen; Choi, Hyun-Soo; Yuan, Feng; Karatzas, Dìmosthenis; Luo, Zhiyong; Pal, Umapada; Rigaud, Christophe; Chazalon, Joseph; Khlif, Wafa; Luqman, Muhammad; Burie, Jean-Christophe; Liu, Cheng‐Lin; Ogier, Jean-Marc

doi:10.1109/icdar.2017.237

Cited by 350 publications

(222 citation statements)

References 9 publications

Supporting

Mentioning

216

Contrasting

Order By: Relevance

“…Our CharNet is evaluated on three standard benchmarks: ICDAR 2015 [17], Total-Text [5], and ICDAR MLT 2017 [27]. ICDAR 2015 includes 1,500 images collected by using Google Glasses.…”

Section: Experiments Results and Comparisonsmentioning

confidence: 99%

Convolutional Character Networks

Xing

Tian

Huang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

158

View full text Add to dashboard Cite

Recent progress has been made on developing a unified framework for joint text detection and recognition in natural images, but existing joint models were mostly built on two-stage framework by involving ROI pooling, which can degrade the performance on recognition task. In this work, we propose convolutional character networks, referred as CharNet, which is an one-stage model that can process two tasks simultaneously in one pass. CharNet directly outputs bounding boxes of words and characters, with corresponding character labels. We utilize character as basic element, allowing us to overcome the main difficulty of existing approaches that attempted to optimize text detection jointly with a RNN-based recognition branch. In addition, we develop an iterative character detection approach able to transform the ability of character detection learned from synthetic data to real-world images. These technical improvements result in a simple, compact, yet powerful onestage model that works reliably on multi-orientation and curved text. We evaluate CharNet on three standard benchmarks, where it consistently outperforms the state-of-theart approaches [25,24] by a large margin, e.g., with improvements of 65.33%→71.08% (with generic lexicon) on ICDAR 2015, and 54.0%→69.23% on Total-Text, on endto-end text recognition. Code is available at: https:// github.com/MalongTech/research-charnet.

show abstract

“…Our CharNet is evaluated on three standard benchmarks: ICDAR 2015 [17], Total-Text [5], and ICDAR MLT 2017 [27]. ICDAR 2015 includes 1,500 images collected by using Google Glasses.…”

Section: Experiments Results and Comparisonsmentioning

confidence: 99%

Convolutional Character Networks

Xing

Tian

Huang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

158

View full text Add to dashboard Cite

show abstract

“…To validate its effectiveness, we adopt the state-of-the-art RetinaNet [19] as our baseline model and present a simple and intuitive text detector named STELA (Scene TExt Detector with Learned Anchor), in which each location of feature maps only associates with one anchor. Following the standard evaluation protocols in each benchmark, our method achieves comparable performances with an F-measure 0.887 on ICDAR 2013 [12], 0.833 on ICDAR 2015 [11] and 0.715 on ICDAR 2017 MLT [25]. Besides, our method is a totally real-time scene text detector with 26.5f ps at 800p, which surpasses all of anchor-based methods.…”

Section: Introductionmentioning

confidence: 86%

“…Restricted by the hardware, the batch size is set to 4 and the initial learning rate is set to 10 −4 . We randomly pick up 100,000 images from SynthText [7] to pretrain the network for 5 epochs, and collect real data from ICDAR 2013 [12], 2015 [11] and 2017 [25] to finetune a final model for 25 epochs. The learning rate is decayed to 10 −5 after 15 epochs of finetuning.…”

Section: Implementation Detailsmentioning

confidence: 99%

STELA: A Real-Time Scene Text Detector With Learned Anchor

Deng

Gong

et al. 2019

IEEE Access

View full text Add to dashboard Cite

To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection where each location of feature maps only associates with one reference box. The idea is inspired from the twostage R-CNN framework that can estimate the location of objects with any shape by using learned proposals. The aim of our method is to integrate this mechanism into a onestage detector and employ the learned anchor which is obtained through a regression operation to replace the original one into the final predictions. Based on RetinaNet, our method achieves competitive performances on several public benchmarks with a totally real-time efficiency (26.5f ps at 800p), which surpasses all of anchor-based scene text detectors. In addition, with less attention on anchor design, we believe our method is easy to be applied on other analogous detection tasks. The code will publicly available at https://github.com/xhzdeng/stela.

show abstract

“…CVSI2015 [25] is released for the ICDAR 2015 Competition on Video Script Identification, containing text line images of 10 Indian scripts. RRC-MLT2017 [26] is released for ICDAR 2017 Competition on MLT-Task2, comprising 68,613 training, 16,255 validation and 97,619 test cropped images. This dataset holds an extremely imbalanced distribution among 7 scripts and especially tilts to Latin.…”

Section: Methodsmentioning

confidence: 99%

“…2) We design softermax loss to accomplish patch-level weak supervision on local predictions with image-level label. 3) Experiments are conducted on three public datasets, i.e., SIW-13 [24], CVSI2015 [25] and RRC-MLT2017 [26], and achieve state-of-the-art performance.…”

Section: Introductionmentioning

confidence: 99%

Patch Aggregator for Scene Text Script Identification

Cheng

Huang

Bai

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

Script identification in the wild is of great importance in a multi-lingual robust-reading system. The scripts deriving from the same language family share a large set of characters, which makes script identification a fine-grained classification problem. Most existing methods make efforts to learn a single representation that combines the local features by making a weighted average or other clustering methods, which may reduce the discriminatory power of some important parts in each script for the interference of redundant features. In this paper, we present a novel module named Patch Aggregator (PA), which learns a more discriminative representation for script identification by taking into account the prediction scores of local patches. Specifically, we design a CNN-based method consisting of a standard CNN classifier and a PA module. Experiments demonstrate that the proposed PA module brings significant performance improvements over the baseline CNN model, achieving the state-of-the-art results on three benchmark datasets for script identification: SIW-13, CVSI 2015 and RRC-MLT 2017.

show abstract

ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT

Cited by 350 publications

References 9 publications

Convolutional Character Networks

Convolutional Character Networks

STELA: A Real-Time Scene Text Detector With Learned Anchor

Patch Aggregator for Scene Text Script Identification

Contact Info

Product

Resources

About