2022
DOI: 10.1007/978-3-031-19836-6_10
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Fine-Grained Sketch-Based Image Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 42 publications
0
12
0
Order By: Relevance
“…Furthermore, sketch-traits like style-diversity [52], datascarcity [5] and redundancy of sketch-strokes [6] were addressed in favor of retrieval. Towards generalising to novel classes, while [42] modelled a universal manifold of prototypical visual sketch traits embedding sketch and photo, [8] adapted to new classes via some supporting sketch-photo pairs. In this paper, we aim to address the problem of zeroshot cross-category FG-SBIR, leveraging the zero-shot potential of a foundation model like CLIP [46].…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Furthermore, sketch-traits like style-diversity [52], datascarcity [5] and redundancy of sketch-strokes [6] were addressed in favor of retrieval. Towards generalising to novel classes, while [42] modelled a universal manifold of prototypical visual sketch traits embedding sketch and photo, [8] adapted to new classes via some supporting sketch-photo pairs. In this paper, we aim to address the problem of zeroshot cross-category FG-SBIR, leveraging the zero-shot potential of a foundation model like CLIP [46].…”
Section: Related Workmentioning
confidence: 99%
“…In particular, along with the triplet loss, we impose a classification loss on the sketch/photo joint-embedding space. For this, instead of usual auxiliary N s -class FC-layer based classification head [8,16,19], we take help of CLIP's text encoder to compute the classification objective, which is already enriched with semantic-visual association. Following [24], we construct a set of handcrafted prompt templates like 'a photo of a [category]' to obtain a list classification weight vectors {t j } Ns j=1 using CLIP's text encoder where the '[category]' token is filled with a specific class name from a list of N s seen classes.…”
Section: Prompt Learning For Zs-sbirmentioning
confidence: 99%
See 3 more Smart Citations