Detecting multi-oriented text with corner-based region proposals

Deng, Linjie; Gong, Yanxiang; Lin, Yi; Shuai, Jingwen; Tu, Xiaoguang; Zhang, Yuefei; Ma, Zheng; Xie, Mei

doi:10.1016/j.neucom.2019.01.013

Cited by 33 publications

(14 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even the multi-scale, our method runs at a speed of 10.5 fps. Compared with recent methods [21,17,23,4], our method is comparable with accuracy and efficiency.…”

Section: Comparison To State Of the Artmentioning

confidence: 85%

See 1 more Smart Citation

STELA: A Real-Time Scene Text Detector With Learned Anchor

Deng

Gong

et al. 2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection where each location of feature maps only associates with one reference box. The idea is inspired from the twostage R-CNN framework that can estimate the location of objects with any shape by using learned proposals. The aim of our method is to integrate this mechanism into a onestage detector and employ the learned anchor which is obtained through a regression operation to replace the original one into the final predictions. Based on RetinaNet, our method achieves competitive performances on several public benchmarks with a totally real-time efficiency (26.5f ps at 800p), which surpasses all of anchor-based scene text detectors. In addition, with less attention on anchor design, we believe our method is easy to be applied on other analogous detection tasks. The code will publicly available at https://github.com/xhzdeng/stela.

show abstract

“…Even the multi-scale, our method runs at a speed of 10.5 fps. Compared with recent methods [21,17,23,4], our method is comparable with accuracy and efficiency.…”

Section: Comparison To State Of the Artmentioning

confidence: 85%

“…As depicted in [4], using rectangular bounding boxes to localize multi-oriented text may result in redundant background noise and unnecessary overlap. Thus, we adopt rotated rectangular boxes to match arbitrary-oriented text instances.…”

Section: Rotated Bounding Box Regressionmentioning

confidence: 99%

STELA: A Real-Time Scene Text Detector With Learned Anchor

Deng

Gong

et al. 2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…Through convolution extension, Li et al [25] created a CNN with multi-scale sliding window; the extended (or atrophic) convolution, which supports the exponential expansion of the receptive field, without scarifying the resolution or coverage, was adopted to expand a convolution filter; this filter was used to piece up a large background through fast computation with a few parameters. In addition, several loss functions have been proposed for bounding box regression: intersection over union network (IoU-Net) [26], Precise Rol Pooling (PrRol-Pooling) [27], and generalized IoU (GIoU) [28]. These functions open a new way to recognize traffic signs with multi-scale CNN.…”

Section: B Deep Learning-based Traffic Sign Recognitionmentioning

confidence: 99%

Automatic Recognition of Traffic Signs Based on Visual Inspection

Chen

Zhang

et al. 2021

IEEE Access

View full text Add to dashboard Cite

The automatic recognition of traffic signs is essential to autonomous driving, assisted driving, and driving safety. Currently, convolutional neural network (CNN) is the most popular deep learning algorithm in traffic sign recognition. However, the CNN cannot capture the poses, perspectives, and directions of the image, nor accurately recognize traffic signs from different perspectives. To solve the problem, the authors presented an automatic recognition algorithm for traffic signs based on visual inspection. For the accuracy of visual inspection, a region of interest (ROI) extraction method was designed through content analysis and key information recognition. Besides, a Histogram of Oriented Gradients (HOG) method was developed for image detection to prevent projection distortion. Furthermore, a traffic sign recognition learning architecture was created based on CapsNet, which relies on neurons to represent target parameters like dynamic routing, path pose and direction, and effectively capture the traffic sign information from different angles or directions. Finally, our model was compared with several baseline methods through experiments on LISA (Laboratory for Intelligent and Safe Automobiles) traffic sign dataset. The model performance was measured by mean average precision (MAP), time, memory, floating point operations per second (FLOPS), and parameter number. The results show that our model consumed shorter time yet better recognition performance than baseline methods, including CNN, support vector machine (SVM), and regionbased fully convolutional network (R-FCN) ResNet 101.

show abstract

“…In [31], the authors proposed the rotation-sensitive regression detector (RRD) framework to perform classification and regression on different features extracted by two different designs of network branches. Deng et al [32] proposed a new two-stage algorithm. In the first stage, the method predicts text instance locations by detecting and linking corners instead of traditional anchor points.…”

Section: Related Workmentioning

confidence: 99%

Towards Accurate Scene Text Detection with Bidirectional Feature Pyramid Network

2021

View full text Add to dashboard Cite

Scene text detection, this task of detecting text from real images, is a hot research topic in the machine vision community. Most of the current research is based on an anchor box. These methods are complex in model design and time-consuming to train. In this paper, we propose a new Fully Convolutional One-Stage Object Detection (FCOS)-based text detection method that can robustly detect multioriented and multilingual text from natural scene images in a per pixel prediction approach. Our proposed text detector employs an anchor-free approach, unlike state-of-the-art text detectors that do not rely on a predefined anchor box. In order to enhance the feature representation ability of FCOS for text detection tasks, we apply the Bidirectional Feature Pyramid Network (BiFPN) as the backbone network, enhancing the model learning capacity and increasing the receptive field. We demonstrate the superior performance of our method on multioriented (ICDAR-2015, ICDAR-2017 MLT) and horizontal (ICDAR-2013) text detection benchmark tasks. Moreover, our method has an f-measure of 88.65 and 86.32 for the benchmark datasets ICDAR 2013 and ICDAR 2015, respectively, and 80.75 for the ICDAR-2017 MLT dataset.

show abstract

Detecting multi-oriented text with corner-based region proposals

Cited by 33 publications

References 52 publications

STELA: A Real-Time Scene Text Detector With Learned Anchor

STELA: A Real-Time Scene Text Detector With Learned Anchor

Automatic Recognition of Traffic Signs Based on Visual Inspection

Towards Accurate Scene Text Detection with Bidirectional Feature Pyramid Network

Contact Info

Product

Resources

About