2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00243
|View full text |Cite
|
Sign up to set email alerts
|

Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(5 citation statements)
references
References 59 publications
0
5
0
Order By: Relevance
“…In Figure 2, the fusion segmentation module (Fus-Seg) shows an intuitive synergy [30,31] between different layers in the network. Then, the module generates labels for every point in the point cloud to extend the expression dimensions of the point cloud features.…”
Section: Fusion Segmentation Modulementioning
confidence: 99%
“…In Figure 2, the fusion segmentation module (Fus-Seg) shows an intuitive synergy [30,31] between different layers in the network. Then, the module generates labels for every point in the point cloud to extend the expression dimensions of the point cloud features.…”
Section: Fusion Segmentation Modulementioning
confidence: 99%
“…Detailed sketch and text input have been used to (a) retrieve e-commerce product images using CNNs and LSTMs (Song et al 2017a), and (b) retrieve scene images using CLIP (Sangkloy et al 2022;Chowdhury et al 2023a). However, in several practical scenarios, (a) the sketch is object-level, very rough, and not elaborate, and (b) the text is partial (complementary to sketch) and not self-contained.…”
Section: Related Workmentioning
confidence: 99%
“…The Thirty-Eighth AAAI Conference on Artificial Intelligence Although a vast literature exists on TBIR and SBIR, to the best of our knowledge, the CSTBIR problem setting has yet to be studied rigorously. There have been some recent works (Song et al 2017a;Sangkloy et al 2022;Chowdhury et al 2023a) that attempt to solve a simpler version, where: (a) target image collection is focused objects rather than complex natural scenes, (b) sketch is at scene-level rather than object-level, or (c) text description is comprehensive rather than partial (or complementary). This paper proposes a system for the complex CSTBIR setting.…”
Section: Introductionmentioning
confidence: 99%
“…(5) Fine-Grained Discriminative Loss: While reconstruction loss aims to align the pixel values between generated and ground-truth photo, the discriminative sketchphoto (paired) association compared to other photos needs to be modelled further to reflect the fine-grained user intent of input sketch in the output space. Triplet with cosinedistance based pre-trained fine-grained SBIR [19] model F g (•) places a sketch nearer to its paired photo compared to others in a joint-embedding space. Therefore, we compute a discriminative fine-grained loss that measures the cosine similarity between s and r as:…”
Section: Training Proceduresmentioning
confidence: 99%