Rotation-Sensitive Regression for Oriented Scene Text Detection

Liao, Minghui; Zhu, Zhen; Shi, Baoguang; Xia, Gui-Song; Bai, Xiang

doi:10.1109/cvpr.2018.00619

Cited by 460 publications

(226 citation statements)

References 43 publications

(99 reference statements)

Supporting

Mentioning

224

Contrasting

Unclassified

Order By: Relevance

“…7 (c)(d)). Large character spacing is an unresolved problem which also exists in other state-of-the-art methods such as RRD [28]. For symbol detection and false positives, PAN is trained on small datasets (about 1000 images) and we believe this problem will be alleviated when increasing training data.…”

Section: Failure Samplesmentioning

confidence: 99%

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network

Wang

Xie

Song

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

399

285

View full text Add to dashboard Cite

Scene text detection, an important step of scene text reading systems, has witnessed rapid development with convolutional neural networks. Nonetheless, two main challenges still exist and hamper its deployment to real-world applications. The first problem is the trade-off between speed and accuracy. The second one is to model the arbitrary-shaped text instance. Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing. More specifically, the segmentation head is made up of Feature Pyramid Enhancement Module (FPEM) and Feature Fusion Module (FFM). FPEM is a cascadable U-shaped module, which can introduce multi-level information to guide the better segmentation. FFM can gather the features given by the FPEMs of different depths into a final feature for segmentation. The learnable post-processing is implemented by Pixel Aggregation (PA), which can precisely aggregate text pixels by predicted similarity vectors. Experiments on several standard benchmarks validate the superiority of the proposed PAN. It is worth noting that our method can achieve a competitive F-measure of 79.9% at 84.2 FPS on CTW1500. * Authors contributed equally. † Corresponding author.10.7% better 4 × faster Figure 1. The performance and speed on curved text dataset CTW1500. PAN-640 is 10.7% better than CTD+TLOC, and PAN-320 is 4 times faster than EAST. * indicates the results from [31].

show abstract

Section: Failure Samplesmentioning

confidence: 99%

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network

Wang

Xie

Song

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

399

285

View full text Add to dashboard Cite

show abstract

“…Note that 'MS' denotes multi-scale testing of the trained models. Compared with the previous methods, e.g., RRD [24] and Border [41] , our baseline text reading models ('End2End' and 'End2End-MS') show slightly better performance in F-score for detection, which is tested in single and multiple scales, respectively. Note that the end-to-end baseline of ICDAR 2017-RCTW [36] marked with + used a large synthetic dataset with a Chinese Figure 6: Matching examples generated by OPM module.…”

Section: Comparisons With Other Approachesmentioning

confidence: 81%

Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning

Sun

Liu

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Most existing text reading benchmarks make it difficult to evaluate the performance of more advanced deep learning models in large vocabularies due to the limited amount of training data. To address this issue, we introduce a new large-scale text reading benchmark dataset named Chinese Street View Text (C-SVT) with 430, 000 street view images, which is at least 14 times as large as the existing Chinese text reading benchmarks. To recognize Chinese text in the wild while keeping large-scale datasets labeling cost-effective, we propose to annotate one part of the C-SVT dataset (30,000 images) in locations and text labels as full annotations and add 400, 000 more images, where only the corresponding text-of-interest in the regions is given as weak annotations. To exploit the rich information from the weakly annotated data, we design a text reading network in a partially supervised learning framework, which enables to localize and recognize text, learn from fully and weakly annotated data simultaneously. To localize the best matched text proposals from weakly labeled images, we propose an online proposal matching module incorporated in the whole model, spotting the keyword regions by sharing parameters for end-to-end training. Compared with fully supervised training algorithms, this model can improve the end-to-end recognition performance remarkably by 4.03% in F-score at the same labeling cost. The proposed model can also achieve state-of-the-art results on the ICDAR 2017-RCTW dataset, which demonstrates the effectiveness of the proposed partially supervised learning framework.

show abstract

“…Even the multi-scale, our method runs at a speed of 10.5 fps. Compared with recent methods [21,17,23,4], our method is comparable with accuracy and efficiency.…”

Section: Comparison To State Of the Artmentioning

confidence: 85%

STELA: A Real-Time Scene Text Detector With Learned Anchor

Deng

Gong

et al. 2019

IEEE Access

View full text Add to dashboard Cite

To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection where each location of feature maps only associates with one reference box. The idea is inspired from the twostage R-CNN framework that can estimate the location of objects with any shape by using learned proposals. The aim of our method is to integrate this mechanism into a onestage detector and employ the learned anchor which is obtained through a regression operation to replace the original one into the final predictions. Based on RetinaNet, our method achieves competitive performances on several public benchmarks with a totally real-time efficiency (26.5f ps at 800p), which surpasses all of anchor-based scene text detectors. In addition, with less attention on anchor design, we believe our method is easy to be applied on other analogous detection tasks. The code will publicly available at https://github.com/xhzdeng/stela.

show abstract

Rotation-Sensitive Regression for Oriented Scene Text Detection

Cited by 460 publications

References 43 publications

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network

Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning

STELA: A Real-Time Scene Text Detector With Learned Anchor

Contact Info

Product

Resources

About