Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks

Yan, Chao; Pan, Weiguo; Xu, Cheng; Dai, Songyin; Li, Xuewei

doi:10.3390/app13105901

Cited by 3 publications

(2 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each branch includes a fully connected layer that generates continuous as well as discrete predictions. Yan et al [27] improve on this architecture by modifying the backbone to employ strip pooling which makes the receptive field more suited to gaze estimation. They also incorporate multi-criss-cross attention to capture dependencies between the eye features.…”

Section: Related Workmentioning

confidence: 99%

“…L2CS-Net [1] uses a ResNet-50 as backbone and employs 2 different prediction heads for the yaw and pitch angles. SPMCCA-Net [27] integrates strip pooling and criss-cross attention to the ResNet backbone. DAM [12] predicts the head pose and extracts the features of the eye crops to make the gaze prediction, however it benefits from additional annotations.…”

Section: A Evaluation In the Wildmentioning

confidence: 99%

See 1 more Smart Citation

From Face to Gait: Weakly-Supervised Learning of Gender Information from Walking Patterns

Catruna

Cosma

Radoi

2021

2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)

View full text Add to dashboard Cite

Gaze estimation, the task of predicting where an individual is looking, is a critical task with direct applications in areas such as human-computer interaction and virtual reality. Estimating the direction of looking in unconstrained environments is difficult, due to the many factors that can obscure the face and eye regions. In this work we propose CrossGaze, a strong baseline for gaze estimation, that leverages recent developments in computer vision architectures and attention-based modules. Unlike previous approaches, our method does not require a specialized architecture, utilizing already established models that we integrate in our architecture and adapt for the task of 3D gaze estimation. This approach allows for seamless updates to the architecture as any module can be replaced with more powerful feature extractors. On the Gaze360 benchmark, our model surpasses several state-of-the-art methods, achieving a mean angular error of 9.94 • . Our proposed model serves as a strong foundation for future research and development in gaze estimation, paving the way for practical and accurate gaze prediction in real-world scenarios.

show abstract

Section: Related Workmentioning

confidence: 99%