IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Yang, Qimeng; Cheng, Mengli; Zhou, Wenmeng; Chen, Yan; Qiu, Minghui; Lin, Wei; Chu, Wei

doi:10.48550/arxiv.1805.01167

Cited by 18 publications

(20 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, Huang et al [32] update feature extraction network derived from Pyramid Attention Network [33], and add a text mask prediction branch that detects curved texts. In addition, Yang et al [34] and Dai et al [35] use a fully convolutional instance-aware semantic segmentation (FCIS) [36] method to guide the prediction of three text-related elements: mask, class, and box, by generating an instance-aware segmentation perspective. Wang et al [37] propose an Adaptive-RPN with a scale-insensitive metric to accurately generate proposal bounding boxes, and then add contour characteristic of text regions by executing the convolution operation in two orthogonal directions to locate texts with arbitrary shapes.…”

Section: Combination Of Segmentation and Regression Methodsmentioning

confidence: 99%

Industrial Scene Text Detection with Refined Feature-attentive Network

Guan,

Gu,

et al. 2021

Preprint

View full text Add to dashboard Cite

Detecting the marking characters of industrial metal parts remains challenging due to low visual contrast, uneven illumination, corroded character structures, and cluttered background of metal part images. Affected by these factors, bounding boxes generated by most existing methods locate low-contrast text areas inaccurately. In this paper, we propose a refined featureattentive network (RFN) to solve the inaccurate localization problem. Specifically, we design a parallel feature integration mechanism to construct an adaptive feature representation from multi-resolution features, which enhances the perception of multiscale texts at each scale-specific level to generate a high-quality attention map. Then, an attentive refinement network is developed by the attention map to rectify the location deviation of candidate boxes. In addition, a re-scoring mechanism is designed to select text boxes with the best rectified location. Moreover, we construct two industrial scene text datasets, including a total of 102156 images and 1948809 text instances with various character structures and metal parts. Extensive experiments on our dataset and four public datasets demonstrate that our proposed method achieves the state-of-the-art performance. Both code and dataset are available at: https://github.com/TongkunGuan/RFN.

show abstract

Section: Combination Of Segmentation and Regression Methodsmentioning

confidence: 99%

Industrial Scene Text Detection with Refined Feature-attentive Network

Guan,

Gu,

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Precision Recall F-measure SegLink [37] 73.10 76.80 75.00 SSTD [8] 80.00 73.00 77.00 WordSup [12] 79.33 77.03 78.16 EAST * [46] 83.27 78.33 80.72 R2CNN [17] 85.62 79.68 82.54 DDR [10] 82.00 80.00 81.00 Lyu et al * [31] 89.50 79.70 84.30 RRD * [24] 88.00 80.00 83.80 TextBoxes++ * [22] 87.80 78.50 82.90 PixelLink [3] 85.50 82.00 83.70 FOTS [26] 91.00 85.17 87.99 IncepText * [42] 89.40 84.30 86.80 TextSnake [29] 84.90 80.40 82.60 FTSN [2] 88.60 80.00 84.10 SPCNET [41] 88…”

Section: Methodsmentioning

confidence: 99%

Pyramid Mask Text Detector

Liu,

Sheng

et al. 2019

Preprint

View full text Add to dashboard Cite

Scene text detection, an essential step of scene text recognition system, is to locate text instances in natural scene images automatically. Some recent attempts benefiting from Mask R-CNN formulate scene text detection task as an instance segmentation problem and achieve remarkable performance. In this paper, we present a new Mask R-CNN based framework named Pyramid Mask Text Detector (PMTD) to handle the scene text detection. Instead of binary text mask generated by the existing Mask R-CNN based methods, our PMTD performs pixel-level regression under the guidance of location-aware supervision, yielding a more informative soft text mask for each text instance. As for the generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D space and introduces a novel plane clustering algorithm to derive the optimal text box on the basis of 3D shape. Experiments on standard datasets demonstrate that the proposed PMTD brings consistent and noticeable gain and clearly outperforms state-of-the-art methods. Specifically, it achieves an F-measure of 80.13% on ICDAR 2017 MLT dataset.

show abstract

“…SmoothL1 Loss. SmoothL1 loss function is one of the most common loss functions for the bounding box regression task, such as in [1,19,23,24,29,32,33], as defined below:…”

Section: Regression Loss Functionmentioning

confidence: 99%

“…For the expression of regression terms of RBox, two main types are categorized. Following the idea in [32], the bounding box regression was categorized into two branches, which are direct regression and indirect regression. The indirect regression method is derived from R-CNN, computing a set of offsets using ground truth and prior boxes, as expressed in Eq.…”

Section: Rbox Regression Parametersmentioning

confidence: 99%

A DCNN-based Arbitrarily-Oriented Object Detector for Quality Control and Inspection Application

Yao¹,

Ortiz²,

Bonnín-Pascual

2021

Preprint

View full text Add to dashboard Cite

Following the success of machine vision systems for on-line automated quality control and inspection processes, an object recognition solution is presented in this work for two different specific applications, i.e., the detection of quality control items in surgery toolboxes prepared for sterilizing in a hospital, as well as the detection of defects in vessel hulls to prevent potential structural failures. The solution has two stages. First, a feature pyramid architecture based on Single Shot MultiBox Detector (SSD) is used to improve the detection performance, and a statistical analysis based on ground truth is employed to select parameters of a range of default boxes. Second, a lightweight neural network is exploited to achieve oriented detection results using a regression method. The first stage of the proposed method is capable of detecting the small targets considered in the two scenarios. In the second stage, despite the simplicity, it is efficient to detect elongated targets while maintaining high running efficiency.

show abstract

IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Cited by 18 publications

References 16 publications

Industrial Scene Text Detection with Refined Feature-attentive Network

Industrial Scene Text Detection with Refined Feature-attentive Network

Pyramid Mask Text Detector

A DCNN-based Arbitrarily-Oriented Object Detector for Quality Control and Inspection Application

Contact Info

Product

Resources

About