2022
DOI: 10.3390/app12136425
|View full text |Cite
|
Sign up to set email alerts
|

Scene Text Detection Using Attention with Depthwise Separable Convolutions

Abstract: In spite of significant research efforts, the existing scene text detection methods fall short of the challenges and requirements posed in real-life applications. In natural scenes, text segments exhibit a wide range of shape complexities, scale, and font property variations, and they appear mostly incidental. Furthermore, the computational requirement of the detector is an important factor for real-time operation. To address the aforementioned issues, the paper presents a novel scene text detector using a dee… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 73 publications
(95 reference statements)
0
4
0
Order By: Relevance
“…Attention mechanisms have become an indispensable tool for designing advanced deep-learning models across various tasks and domains. To preserve more text feature information during feature extraction, we propose an improved attention feature fusion module (DSAF) based on AFF [ 32 ] which uses depthwise separable convolution [ 33 ] and embeds it into the feature extraction network ResNet [ 34 ] to reduce the loss of feature information and increase the degree of attention to features of different scales. The structure of this module is illustrated in Figure 2 .…”
Section: Methodsmentioning
confidence: 99%
“…Attention mechanisms have become an indispensable tool for designing advanced deep-learning models across various tasks and domains. To preserve more text feature information during feature extraction, we propose an improved attention feature fusion module (DSAF) based on AFF [ 32 ] which uses depthwise separable convolution [ 33 ] and embeds it into the feature extraction network ResNet [ 34 ] to reduce the loss of feature information and increase the degree of attention to features of different scales. The structure of this module is illustrated in Figure 2 .…”
Section: Methodsmentioning
confidence: 99%
“…The difficulty in this detection task was that the detection frame rate of the model needed to be higher than the video frame rate in order to achieve the effect of the real-time detection. Current designs for lightweight networks were mainly applied in the following areas: the first was the lightweight design of convolutional layers, such as deep separable convolution [ 68 , 69 , 70 ]. The second was the design of convolutional modules, e.g., the annealing module used in Squeeze Net to achieve light-weighting by reducing the network parameters [ 71 , 72 ].…”
Section: Methodsmentioning
confidence: 99%
“…After the CNN were used to extract features, the performance of the STD model began to depend on the design of special components, like Region Proposal Network (RPN), Feature Pyramid Network (FPN) [13,14], anchors, and other factors [15,16]. These algorithms required a lot of prior knowledge and complex post-processing steps.…”
Section: Related Workmentioning
confidence: 99%
“…Nowadays, more and more research about images introduces the Transformer and abandons traditional CNN [16]. Vision Transformer (ViT) [26] improved Transformer to classify images.…”
Section: Related Workmentioning
confidence: 99%