2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00379
|View full text |Cite
|
Sign up to set email alerts
|

Goal-Oriented Gaze Estimation for Zero-Shot Learning

Abstract: Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Since semantic knowledge is built on attributes shared between different classes, which are highly local, strong prior for localization of object attribute is beneficial for visual-semantic embedding. Interestingly, when recognizing unseen images, human would also automatically gaze at regions with certain semantic clue. Therefore, we introduce a novel goal-oriented gaze estimation m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
45
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 81 publications
(45 citation statements)
references
References 54 publications
0
45
0
Order By: Relevance
“…Furthermore, the holistic visual features are limited to poor transferable from one domain to another domain (e.g., from seen to unseen classes) [45], [46]. More relevant to this work are the recent attentionbased ZSL methods [28], [30], [31], [32], [47] that utilize attribute descriptions as guidance to discover the more discriminative region (or part) features. Unfortunately, They simply learn region embeddings (e.g., the whole bird body) neglecting the importance of discriminative attribute localization (e.g., the distinctive bird body parts).…”
Section: Zero-shot Learningmentioning
confidence: 99%
“…Furthermore, the holistic visual features are limited to poor transferable from one domain to another domain (e.g., from seen to unseen classes) [45], [46]. More relevant to this work are the recent attentionbased ZSL methods [28], [30], [31], [32], [47] that utilize attribute descriptions as guidance to discover the more discriminative region (or part) features. Unfortunately, They simply learn region embeddings (e.g., the whole bird body) neglecting the importance of discriminative attribute localization (e.g., the distinctive bird body parts).…”
Section: Zero-shot Learningmentioning
confidence: 99%
“…However, these methods still usually yield relatively undesirable results, since they cannot efficiently capture the subtle differences between seen and unseen classes. More relevant to this work are the recent attention-based ZSL methods (Xie et al 2019(Xie et al , 2020Zhu et al 2019;Xu et al 2020;Liu et al 2021) that utilize attribute descriptions as guidance to discover the more discriminative region (or part) features. Unfortunately, They simply learn region embeddings (e.g., the whole bird body) neglecting the importance of discriminative attribute localization (e.g., the distinctive bird body parts).…”
Section: Related Workmentioning
confidence: 99%
“…Papers [9][10][11] are representatives of the second group. Their models use the class definition vectors as a fixed classification layer, and add modules that help the network localize and implicitly detect attributes in the visual space.…”
Section: Related Workmentioning
confidence: 99%
“…Their models use the class definition vectors as a fixed classification layer, and add modules that help the network localize and implicitly detect attributes in the visual space. We focus on and extend [9] in our Method section, since it does not require extra knowledge such as human gaze points as leveraged by [11], or too many added loss terms to fine-tune as proposed by [10].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation