2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00853
|View full text |Cite
|
Sign up to set email alerts
|

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network

Abstract: Scene text detection, an important step of scene text reading systems, has witnessed rapid development with convolutional neural networks. Nonetheless, two main challenges still exist and hamper its deployment to real-world applications. The first problem is the trade-off between speed and accuracy. The second one is to model the arbitrary-shaped text instance. Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into conside… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
300
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 454 publications
(367 citation statements)
references
References 46 publications
2
300
0
1
Order By: Relevance
“…Images of each concept are quality-controlled and human-annotated. ImageNet pre-training has been widely used in various computer vision tasks, such as fine-grained image classification ( Cui et al, 2018 ; Fu et al, 2017 ; Russakovsky et al, 2015 ), object detection ( Redmon et al, 2016 ; He et al, 2017 ), and sense text detection ( Zhou et al, 2017 ; Wang et al, 2019b ).…”
Section: Pre-training Datasetsmentioning
confidence: 99%
“…Images of each concept are quality-controlled and human-annotated. ImageNet pre-training has been widely used in various computer vision tasks, such as fine-grained image classification ( Cui et al, 2018 ; Fu et al, 2017 ; Russakovsky et al, 2015 ), object detection ( Redmon et al, 2016 ; He et al, 2017 ), and sense text detection ( Zhou et al, 2017 ; Wang et al, 2019b ).…”
Section: Pre-training Datasetsmentioning
confidence: 99%
“…Comparisons On Total-Text: As shown in Table XI, the Recall of our proposed method outperforms the state-of-the-art methods that do not pre-train on the synthetic dataset SynthText [12]. Specifically, our model improves the Recall of 1.8% compared with the detector PAN [41] in a single-scale testing. When we utilize ResNet-101 pre-trained on OpenImage to initialize our model and test the model using multiple scales, our method also works better than other methods that use multi-scale testing strategies and pre-train on SynthText.…”
Section: E Comparisons With Related Methodsmentioning
confidence: 85%
“…For the segmentation-based methods, some researchers directly segment the text regions from the entire input image. Instead of only performing semantic segmentation for each pixel, more excellent approaches are proposed to learn more attributes, such as learning the link relationship among pixels [39], predicting the text border [24], learning the geometry attributes [7], [8], [27]- [32] of each pixel, constructing text instance with the progressive scale expansion [9], pulling pixels of the same text and pushing pixels of different text instances [40], [41], and so on. Besides, some methods perform the segmentation only on the bounding box.…”
Section: Related Workmentioning
confidence: 99%
“…While it has a similar network architecture composed of a backbone, neck, and head as in YoloV3. The feature extract backbone uses in YoloV4 is CSPDarket, and its neck uses a two-way FPN structure, which called PAN [56], instead of traditional FPN to fusion the multi-layer feature maps. FSM was then applied into the PAN architecture of YoloV4.…”
Section: Yolov4 With Fsmmentioning
confidence: 99%