Deep Direct Regression for Multi-oriented Scene Text Detection

He, Wenhao; Zhang, Xu-Yao; Yin, Fei; Liu, Cheng‐Lin

doi:10.1109/iccv.2017.87

Cited by 372 publications

(200 citation statements)

References 27 publications

Supporting

Mentioning

200

Contrasting

Order By: Relevance

“…[69] and [64] propose methods which first detect text segments and then link them into text instances by spatial relationship or link predictions. Zhou et al [82] and He et al [27] regress text boxes directly from dense segmentation maps. Lyu et al [51] propose to detect and group the corner points of the text to generate text boxes.…”

Section: Scene Text Detectionmentioning

confidence: 99%

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Liao

Lyu

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

203

141

View full text Add to dashboard Cite

Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

show abstract

Section: Scene Text Detectionmentioning

confidence: 99%

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Liao

Lyu

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

203

141

View full text Add to dashboard Cite

show abstract

“…The aspect ratio of text lines varies greatly, and limited anchors cannot cover the size or aspect ratio of all objects; thus, many methods are anchor-free. Both [4] and [1] generate labels with shrunk segmentation maps, and regress the vertices or angles of the bounding box on positive pixels. [29] generates a corner map and a position-sensitive segmentation map, calculates oriented bounding boxes based on the corner map, and calculates the score for each bounding box using the position-sensitive segmentation map.…”

Section: B Oriented Objects Detectionmentioning

confidence: 99%

Adaptive Period Embedding for Representing Oriented Objects in Aerial Images

Zhu

2020

IEEE Trans. Geosci. Remote Sensing

View full text Add to dashboard Cite

We propose a novel method for representing oriented objects in aerial images named Adaptive Period Embedding (APE). While traditional object detection methods represent object with horizontal bounding boxes, the objects in aerial images are oritented. Calculating the angle of object is an yet challenging task. While almost all previous object detectors for aerial images directly regress the angle of objects, they use complex rules to calculate the angle, and their performance is limited by the rule design. In contrast, our method is based on the angular periodicity of oriented objects. The angle is represented by two two-dimensional periodic vectors whose periods are different, the vector is continuous as shape changes. The label generation rule is more simple and reasonable compared with previous methods. The proposed method is general and can be applied to other oriented detector. Besides, we propose a novel IoU calculation method for long objects named length independent IoU (LIIoU). We intercept part of the long side of the target box to get the maximum IoU between the proposed box and the intercepted target box. Thereby, some long boxes will have corresponding positive samples. Our method reaches the 1 st place of DOAI2019 competition task1 (oriented object) held in workshop on Detecting Objects in Aerial Images in conjunction with IEEE CVPR 2019.

show abstract

“…For bounding box regression based methods, they can be divided into one-stage methods and two-stage methods. One-stage methods including Deep Direct Regression [5], TextBox [12], TextBoxes++ [11], DMPNet [16], SegLink [24] and EAST [34], directly estimate bounding boxes of text regions in one step. Two-stage methods in-clude R2CNN [8], RRD [13], RRPN [22], IncepText [28] and FEN [31].…”

Section: Related Workmentioning

confidence: 99%

Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation

Wang

Jiang

Luo

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

178

View full text Add to dashboard Cite

Scene text detection attracts much attention in computer vision, because it can be widely used in many applications such as real-time text translation, automatic information entry, blind person assistance, robot sensing and so on. Though many methods have been proposed for horizontal and oriented texts, detecting irregular shape texts such as curved texts is still a challenging problem. To solve the problem, we propose a robust scene text detection method with adaptive text region representation. Given an input image, a text region proposal network is first used for extracting text proposals. Then, these proposals are verified and refined with a refinement network. Here, recurrent neural network based adaptive text region representation is proposed for text region refinement, where a pair of boundary points are predicted each time step until no new points are found. In this way, text regions of arbitrary shapes are detected and represented with adaptive number of boundary points. This gives more accurate description of text regions. Experimental results on five benchmarks, namely, CTW1500, TotalText, ICDAR2013, ICDAR2015 and MSRA-TD500, show that the proposed method achieves state-ofthe-art in scene text detection.

show abstract

Deep Direct Regression for Multi-oriented Scene Text Detection

Cited by 372 publications

References 27 publications

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Adaptive Period Embedding for Representing Oriented Objects in Aerial Images

Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation

Contact Info

Product

Resources

About