Canjie Luo scite author profile

Text recognition has attracted considerable research interests because of its various applications. The cutting-edge text recognition methods are based on attention mechanisms. However, most of attention methods usually suffer from serious alignment problem due to its recurrency alignment operation, where the alignment relies on historical decoding results. To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results. DAN is an effective, flexible and robust end-to-end text recognizer, which consists of three components: 1) a feature encoder that extracts visual features from the input image; 2) a convolutional alignment module that performs the alignment operation based on visual features from the encoder; and 3) a decoupled text decoder that makes final prediction by jointly using the feature map and attention maps. Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition. Codes will be released.1

show abstract

Curved scene text detection via transverse and longitudinal sequence connection

Liu

Jin

Zhang

et al. 2019

Pattern Recognition

200

View full text Add to dashboard Cite

EraseNet: End-to-End Text Removal in the Wild

Liu

Jin

et al. 2020

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Text Recognition in the Wild

et al. 2021

View full text Add to dashboard Cite

The history of text can be traced back over thousands of years. Rich and precise semantic information carried by text is important in a wide range of vision-based application scenarios. Therefore, text recognition in natural scenes has been an active research topic in computer vision and pattern recognition. In recent years, with the rise and development of deep learning, numerous methods have shown promising results in terms of innovation, practicality, and efficiency. This article aims to (1) summarize the fundamental problems and the state-of-the-art associated with scene text recognition, (2) introduce new insights and ideas, (3) provide a comprehensive review of publicly available resources, and (4) point out directions for future work. In summary, this literature review attempts to present an entire picture of the field of scene text recognition. It provides a comprehensive reference for people entering this field and could be helpful in inspiring future research. Related resources are available at our GitHub repository: https://github.com/HCIILAB/Scene-Text-Recognition.

show abstract

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Luo

Zhu

Jin

et al. 2020

View full text Add to dashboard Cite

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT

et al. 2019

View full text Add to dashboard Cite

This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text -RRC-ArT that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 -82.65%, ii) T2.1 -74.3%, iii) T2.2 -85.32%, iv) T3.1 -53.86%, and v) T3.2 -54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants' methods. The dataset, the evaluation kit as well as the results are publicly available at the challenge website 1 .

show abstract

ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT

Sun

Karatzas

Chan

et al. 2019

View full text Add to dashboard Cite

Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 50, 000 and 400, 000 images in full and weak annotations, respectively. This competition aims to explore the abilities of state-of-the-art methods to detect and recognize text instances from large-scale street view images, closing the gap between research benchmarks and real applications. During the competition period, a total of 41 teams participated in the two proposed tasks with 132 valid submissions, i.e., text detection and end-to-end text spotting. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2019-LSVT challenge.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Canjie Luo

MORAN: A Multi-Object Rectified Attention Network for scene text recognition

Decoupled Attention Network for Text Recognition

Curved scene text detection via transverse and longitudinal sequence connection

EraseNet: End-to-End Text Removal in the Wild

Text Recognition in the Wild

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT

ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT

Contact Info

Product

Resources

About