2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.592
|View full text |Cite
|
Sign up to set email alerts
|

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Abstract: Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instancelevel retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
220
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 211 publications
(221 citation statements)
references
References 40 publications
0
220
0
1
Order By: Relevance
“…As discussed in existing studies [44,42,6,5], CNNs may suffer from the sparsity of inputs (e.g., raster sketches), though they excel at building hierarchical representations of 2D inputs. Instead of struggling to estimate attention from binary images that contain limited information [34], we argue that additional cues, such as the temporal ordering and grouping information in vector sketches, are essential to learn reliable attention for strokes. In our method, we resort to RNNs for computing attention for each point in a vector sketch, and use our NLR module for in-network vector-to-raster conversion.…”
Section: Related Workmentioning
confidence: 98%
See 1 more Smart Citation
“…As discussed in existing studies [44,42,6,5], CNNs may suffer from the sparsity of inputs (e.g., raster sketches), though they excel at building hierarchical representations of 2D inputs. Instead of struggling to estimate attention from binary images that contain limited information [34], we argue that additional cues, such as the temporal ordering and grouping information in vector sketches, are essential to learn reliable attention for strokes. In our method, we resort to RNNs for computing attention for each point in a vector sketch, and use our NLR module for in-network vector-to-raster conversion.…”
Section: Related Workmentioning
confidence: 98%
“…With a trained SVM, Schneider et al [31] qualitatively analyzed how stroke importance affects classification scores by iteratively removing each stroke from the corresponding raster sketch image. To automatically capture stroke importance during the learning process, researchers have attempted to adapt an attention mechanism in network design [34]. Attention mechanism has been widely used in many visual tasks, such as image classification [24,40,37,10], image caption [41,22] or Visual Question Answering (VQA) [25].…”
Section: Related Workmentioning
confidence: 99%
“…The hand-crafted techniques mostly work with Bag-of-Words representations of sketch and edge map of natural image on top of some off-the-shelf features, such as, SIFT [19], Gradient Field HOG [10], Histogram of Edge Local Orientations [25] or Learned Key Shapes [26]) etc. This domain shift issue is further addressed by crossdomain deep learning-based methods [27,37], where they have used classical ranking losses, such as, contrastive loss, triplet loss [32] or more elegant HOLEF loss [30] within a siamese like network. Based on the problem at hand, two separated tasks have been identified: (1) Fine-grained SBIR (FG-SBIR) aims to capture fine-grained similarities of sketch and photo [15,27,37] and (2) Coarse-grained SBIR (CG-SBIR) performs a instance level search across multiple object categories [38,10,11,31,38], which has received a lot of attention due to its importance.…”
Section: Related Workmentioning
confidence: 99%
“…grained matching [37,30,24], large-scale hashing [17,16], cross-modal attention [5,30] to name a few. However, a common bottleneck identified by almost all sketch researches is that of data scarcity.…”
Section: Introductionmentioning
confidence: 99%
“…Our Semantic-Aware Knowledge prEservation (SAKE) preserves original domain knowledge of rich visual features (e.g., visual details of different subtypes of cars) which helps distinguishing the right photo candidates (e.g., SUV) from distractors (e.g., race car) in the unseen classes. neural networks into this field [22,45,21,43,37,33,30,44,42,39]. In the conventional setting, it is assumed that training and testing images are from the same set of object categories, in which scenario existing approaches achieved satisfying performance [22].…”
Section: Introductionmentioning
confidence: 99%