Learning Shape-Aware Embedding for Scene Text Detection

Tian, Zhuotao; Shu, Michelle; Lyu, Pengyuan; Li, Ruiyu; Zhou, Chao; Shen, Xiaoyong; Jia, Jiaya

doi:10.1109/cvpr.2019.00436

Cited by 215 publications

(108 citation statements)

References 37 publications

Supporting

Mentioning

101

Contrasting

Order By: Relevance

“…In this section, we compare our network with the stateof-the-art approaches [1], [3], [10], [11], [16], [18], [20], [21], [23], [24], [27]- [29], [43], [45], [49], [65], [66], [69], [69], [71]- [73] on six different benchmark datasets. We consider recall, precision, and f-measure as the metrics for evaluation of accuracy of detection.…”

Section: Comparison With State-of-the-art Resultsmentioning

confidence: 99%

“…They use oriented region proposal network and oriented region-of-interest pooling layer to map arbitrary-oriented region proposals to a feature tensor for text classification. Tian et al project pixels onto an embedding space, where they consider pixels of same text instances appear closer to each other [23]. The authors in [24] incorporate normalization of scale and orientation of text instances to map to a desired canonical geometry range.…”

Section: A Scene Text Detectionmentioning

confidence: 99%

See 1 more Smart Citation

Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment

2020

View full text Add to dashboard Cite

Scene text spotting aims at simultaneously localizing and recognizing text instances, symbols, and logos in natural scene images. Scene text detection and recognition approaches have received immense attention in computer vision research community. The presence of partial occlusion or truncation artifact due to the cluttered background of scene images creates an obstacle in perceiving the text instances, which makes the process of spotting very complex. In this paper, we propose a lightweight scene text spotter that can address the issue of cluttered environment of scene images. It is an end-to-end trainable deep neural network that uses local part information, global structural features, and context cue information of oriented region proposals for spotting text instances. It helps to localize in scene images with background clutters, where partially occluded text parts, truncation artifacts, and perspective distortions are present. We mitigate the problem of misclassification caused by inter-class interference by exploring inter-class separability and intra-class compactness. We also incorporate multi-language character segmentation and word-level recognition in a lightweight recognition module. We have used six publicly available benchmark datasets in different smart devices to illustrate the efficacy of the network.

show abstract

Section: Comparison With State-of-the-art Resultsmentioning

confidence: 99%

Section: A Scene Text Detectionmentioning

confidence: 99%

Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment

2020

View full text Add to dashboard Cite

show abstract

“…Ext 85.3 67.9 75.6 Wang et al [9] 80.2 80.1 80.1 Tian et al [13] 77.8 82.7 80.1 PSENet [12] 79.7 84.8 82.2 DB [12] 80.2 86.9 83.4 Ours 80.3 84.9 82.5 : The single-scale results on CTW1500. "R", "P" and "F" represent the recall, precision, and F-measure respectively.…”

Section: Methodsmentioning

confidence: 99%

“…However, it has many output kernels which may have negative effects on location results. [13] adopts a mirror symmetry of FPN [14] to produce embedding features and text foreground masks, and uses cluster processing to detect texts. DB [15] proposes a Differentiable Binarization module to predict the shrunk regions, and the shrunk regions are dilated with an constant expanding ratio.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Arbitrary-Shaped Text Detection With Adaptive Text Region Representation

Jiang

Zhang

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Text detection/localization, as an important task in computer vision, has witnessed substantial advancements in methodology and performance with convolutional neural networks. However, the vast majority of popular methods use rectangles or quadrangles to describe text regions. These representations have inherent drawbacks, especially relating to dense adjacent text and loose regional text boundaries, which usually cause difficulty detecting arbitrarily shaped text. In this paper, we propose a novel text region representation method, with a robust pipeline, which can precisely detect dense adjacent text instances with arbitrary shapes. We consider a text instance to be composed of an adaptive central text region mask and a corresponding expanding ratio between the central text region and the full text region. More specifically, our pipeline generates adaptive central text regions and corresponding expanding ratios with a proposed training strategy, followed by a new proposed post-processing algorithm which expands central text regions to the complete text instance with the corresponding expanding ratios. We demonstrated that our new text region representation is effective, and that the pipeline can precisely detect closely adjacent text instances of arbitrary shapes. Experimental results on common datasets demonstrate superior performance of our work. INDEX TERMS Scene text detection, arbitrary-shaped, text region representation, deformable convolutional network I. INTRODUCTION

show abstract

Integrating multiple MRI sequences for pelvic organs segmentation via the attention mechanism

et al. 2021

View full text Add to dashboard Cite

Purpose To create a network which fully utilizes multi‐sequence MRI and compares favorably with manual human contouring. Methods We retrospectively collected 89 MRI studies of the pelvic cavity from patients with prostate cancer and cervical cancer. The dataset contained 89 samples from 87 patients with a total of 84 valid samples. MRI was performed with T1‐weighted (T1), T2‐weighted (T2), and Enhanced Dixon T1‐weighted (T1DIXONC) sequences. There were two cohorts. The training cohort contained 55 samples and the testing cohort contained 29 samples. The MRI images in the training cohort contained contouring data from radiotherapist α. The MRI images in the testing cohort contained contouring data from radiotherapist α and contouring data from another radiotherapist: radiotherapist β. The training cohort was used to optimize the convolution neural networks, which included the attention mechanism through the proposed activation module and the blended module into multiple MRI sequences, to perform autodelineation. The testing cohort was used to assess the networks’ autodelineation performance. The contoured organs at risk (OAR) were the anal canal, bladder, rectum, femoral head (L), and femoral head (R). Results We compared our proposed network with UNet and FuseUNet using our dataset. When T1 was the main sequence, we input three sequences to segment five organs and evaluated the results using four metrics: the DSC (Dice similarity coefficient), the JSC (Jaccard similarity coefficient), the ASD (average mean distance), and the 95% HD (robust Hausdorff distance). The proposed network achieved improved results compared with the baselines among all metrics. The DSC were 0.834±0.029, 0.818±0.037, and 0.808±0.050 for our proposed network, FuseUNet, and UNet, respectively. The 95% HD were 7.256±2.748 mm, 8.404±3.297 mm, and 8.951±4.798 mm for our proposed network, FuseUNet, and UNet, respectively. Our proposed network also had superior performance on the JSC and ASD coefficients. Conclusion Our proposed activation module and blended module significantly improved the performance of FuseUNet for multi‐sequence MRI segmentation. Our proposed network integrated multiple MRI sequences efficiently and autosegmented OAR rapidly and accurately. We also discovered that three‐sequence fusion (T1‐T1DIXONC‐T2) was superior to two‐sequence fusion (T1‐T2 and T1‐T1DIXONC, respectively). We infer that the more MRI sequences fused, the better the automatic segmentation results.

show abstract

Learning Shape-Aware Embedding for Scene Text Detection

Cited by 215 publications

References 37 publications

Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment

Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment

Arbitrary-Shaped Text Detection With Adaptive Text Region Representation

Integrating multiple MRI sequences for pelvic organs segmentation via the attention mechanism

Contact Info

Product

Resources

About