Open Images V5 Text Annotation and Yet Another Mask Text Spotter

Krylov, Ilya; Nosov, S. K.; Sovrasov, Vladislav

doi:10.48550/arxiv.2106.12326

Cited by 2 publications

(3 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dai et al [79] introduced the Region-based Fully Convolutional Network (R-FCN), which uses fully connected layers that share almost all processing across the entire image instead of convolutional layers. They proposed position-sensitive score maps to address the translation invariance problem, which includes:…”

Section: ) R-fcnmentioning

confidence: 99%

Object Detection Performance: A Comparative Study

Qaddour

2023

Preprint

View full text Add to dashboard Cite

Object detection is a critical task in computer vision with applications in many domains. Recent advances in deep learning have led to significant improvements in the performance of object detectors. This paper presents a comparative performance analysis of generic object detectors, with a focus on single-stage and two-stage detectors. The paper first discusses the taxonomy of object detection algorithms, and then presents a detailed performance comparison of single-stage and two-stage detectors. The performance of different detectors was evaluated on two different datasets, Microsoft COCO and PASCAL VOC 2012. The results showed that DetectoRS is a state-of-the-art two-stage object detector, outperforms all other two-stage models. While YOLOv4 and FCOS are the two most accurate single-stage detectors. The comparative results also show that single-stage detectors are generally less accurate than two-stage detectors, but they are typically faster. The paper also includes the strengths and weaknesses of different object detection approaches and identifies promising directions for future research.

show abstract

Section: ) R-fcnmentioning

confidence: 99%

Object Detection Performance: A Comparative Study

Qaddour

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…It was replaced with RoIAlign [11] that used a bilinear interpolation for weighted feature sampling, that was also extended for the first time for sampling non axis-aligned (i.e., rotated ) RoIs [27]. For sampling arbitrarily shaped text, further extensions [33,16,20] added a background mask to the sampling operation for isolating the extracted word only, often relying on segmentation-based detectors or masks.…”

Section: Background and Related Workmentioning

confidence: 99%

“…The components in this approach are mostly explored independently in the literature, isolating either the word detection performance (ignoring transcripts) [47,4,2,21,39], or the recognition performance over datasets composed of word-crop images [1,41,25,31]. The second approach is a combined End-to-End (E2E) architecture, adding a recognition branch that operates directly on the detection model's latent features [8,3,16,36,33,20,29]. Feature sampling replaces cropping, allowing detection and recognition to be jointly trained E2E.…”

Section: Introductionmentioning

confidence: 99%

GLASS: Global to Local Attention for Scene-Text Spotting

Ronen¹,

Tsiper²,

Anschel³

et al. 2022

Preprint

View full text Add to dashboard Cite

In recent years, the dominant paradigm for text spotting is to combine the tasks of text detection and recognition into a single endto-end framework. Under this paradigm, both tasks are accomplished by operating over a shared global feature map extracted from the input image. Among the main challenges that end-to-end approaches face is the performance degradation when recognizing text across scale variations (smaller or larger text), and arbitrary word rotation angles. In this work, we address these challenges by proposing a novel global-to-local attention mechanism for text spotting, termed GLASS , that fuses together global and local features. The global features are extracted from the shared backbone, preserving contextual information from the entire image, while the local features are computed individually on resized, high resolution rotated word crops. The information extracted from the local crops alleviates much of the inherent difficulties with scale and word rotation. We show a performance analysis across scales and angles, highlighting improvement over scale and angle extremities. In addition, we introduce an orientation-aware loss term supervising the detection task, and show its contribution to both detection and recognition performance across all angles. Finally, we show that GLASS is general by incorporating it into other leading text spotting architectures, improving their text spotting performance. Our method achieves state-of-the-art results on multiple benchmarks, including the newly released TextOCR.

show abstract

Open Images V5 Text Annotation and Yet Another Mask Text Spotter

Cited by 2 publications

References 16 publications

Object Detection Performance: A Comparative Study

Object Detection Performance: A Comparative Study

GLASS: Global to Local Attention for Scene-Text Spotting

Contact Info

Product

Resources

About