2023
DOI: 10.3390/app13105901
|View full text |Cite
|
Sign up to set email alerts
|

Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks

Abstract: Deep learning techniques for gaze estimation usually determine gaze direction directly from images of the face. These algorithms achieve good performance because face images contain more feature information than eye images. However, these image classes contain a substantial amount of redundant information that may interfere with gaze prediction and may represent a bottleneck for performance improvement. To address these issues, we model long-distance dependencies between the eyes via Strip Pooling and Multi-Cr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 30 publications
0
2
0
Order By: Relevance
“…Each branch includes a fully connected layer that generates continuous as well as discrete predictions. Yan et al [27] improve on this architecture by modifying the backbone to employ strip pooling which makes the receptive field more suited to gaze estimation. They also incorporate multi-criss-cross attention to capture dependencies between the eye features.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Each branch includes a fully connected layer that generates continuous as well as discrete predictions. Yan et al [27] improve on this architecture by modifying the backbone to employ strip pooling which makes the receptive field more suited to gaze estimation. They also incorporate multi-criss-cross attention to capture dependencies between the eye features.…”
Section: Related Workmentioning
confidence: 99%
“…L2CS-Net [1] uses a ResNet-50 as backbone and employs 2 different prediction heads for the yaw and pitch angles. SPMCCA-Net [27] integrates strip pooling and criss-cross attention to the ResNet backbone. DAM [12] predicts the head pose and extracts the features of the eye crops to make the gaze prediction, however it benefits from additional annotations.…”
Section: A Evaluation In the Wildmentioning
confidence: 99%