2021
DOI: 10.1109/tip.2021.3064256
|View full text |Cite
|
Sign up to set email alerts
|

Attend and Guide (AG-Net): A Keypoints-Driven Attention-Based Deep Network for Image Recognition

Abstract: This paper presents a novel keypoints-based attention mechanism for visual recognition in still images. Deep Convolutional Neural Networks (CNNs) for recognizing images with distinctive classes have shown great success, but their performance in discriminating fine-grained changes is not at the same level. We address this by proposing an end-to-end CNN model, which learns meaningful features linking fine-grained changes using our novel attention mechanism. It captures the spatial structures in images by identif… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
8
1
1

Relationship

3
7

Authors

Journals

citations
Cited by 30 publications
(15 citation statements)
references
References 58 publications
(141 reference statements)
0
15
0
Order By: Relevance
“…The computer vision-based sports action recognition systems can provide rapid postmatch analysis and real-time objective feedback before the next race for coaches and players. Bera et al [ 12 ] pointed out that the fundamental points of athletes' actions could be captured via three-dimensional video shooting techniques. Yu and Chin [ 13 ] recorded athletes' time-spatial action images via radio frequency technology in IoT, which turned out to be blurred and could not be seen clearly.…”
Section: Introductionmentioning
confidence: 99%
“…The computer vision-based sports action recognition systems can provide rapid postmatch analysis and real-time objective feedback before the next race for coaches and players. Bera et al [ 12 ] pointed out that the fundamental points of athletes' actions could be captured via three-dimensional video shooting techniques. Yu and Chin [ 13 ] recorded athletes' time-spatial action images via radio frequency technology in IoT, which turned out to be blurred and could not be seen clearly.…”
Section: Introductionmentioning
confidence: 99%
“…[9] presented an ensemble of four CNN models to handle different parts of the driver, including the face, hands, and body, to recognize driver activity. [38] proposed an attend and guide network to classify driver behavior by obtaining the spatial structures of images through the identification of semantic regions and their spatial distributions. [39] concatenated three CNN models to construct a hybrid framework for detecting distracted driver behavior.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, partbased methods limit both scalability and practicality of realworld FGVC applications. Thus, many recent methods have used image-level labels to guide their models in identifying the key object parts to discriminate the sub-categories by exploring attention mechanisms in the image space or feature space [7]- [10] to automatically mine discriminative features.…”
Section: Introductionmentioning
confidence: 99%