2020
DOI: 10.1016/j.imavis.2020.104003
|View full text |Cite
|
Sign up to set email alerts
|

CrossATNet - a novel cross-attention based framework for sketch-based image retrieval

Abstract: We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR). Conventionally, the SBIR schema mainly considers simultaneous mappings among the two image views and the semantic side information. Therefore, it is desirable to consider fine-grained classes mainly in the sketch domain using highly discriminative and semantically rich feature space. However, the existing deep generative modelling based SBIR approaches majorly focus on bridging the gaps … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 28 publications
(18 citation statements)
references
References 30 publications
0
11
0
Order By: Relevance
“…These cross-sensor retrieval techniques also extend to retrieving unseen class images upon deployment, commonly called zero-shot cross-modal retrieval [45]. As one of the major bottlenecks of solving remote sensing problems using deep learning is the lack of annotated samples for training, these cross-sensor zero-shot retrieval has received a lot of attention recently [18].…”
Section: B Retrieval In Rsmentioning
confidence: 99%
See 1 more Smart Citation
“…These cross-sensor retrieval techniques also extend to retrieving unseen class images upon deployment, commonly called zero-shot cross-modal retrieval [45]. As one of the major bottlenecks of solving remote sensing problems using deep learning is the lack of annotated samples for training, these cross-sensor zero-shot retrieval has received a lot of attention recently [18].…”
Section: B Retrieval In Rsmentioning
confidence: 99%
“…Recently, we have seen a lot of focus on cross-sensor/cross-modal retrieval techniques in RS using various learning techniques [15], [16]. Some of the notable works in this domain are presented in [17], [18], [19], [20]. Several literary works in this domain have tried to exploit the conventional triplet loss or the Siamese loss function to discriminate classes within a fine-grained dataset.…”
Section: Introductionmentioning
confidence: 99%
“…Most existing online hashing methods have been devoted to the trade-off between accuracy and efficiency [15][16][17]. According to the learning strategy, people divide these techniques into unsupervised online hashing and supervised online hashing [18][19][20]. e well-known unsupervised methods mainly include online sketch hashing (SketchHash) [21], FasteR online sketch hash (FROSH) [22], and zero-mean sketch [23].…”
Section: Introductionmentioning
confidence: 99%
“…However, while [9], [10], [19] used the semantic space, they made the semantic space latent and learnable causing the network to eventually loose the classwise topology information as the network is trained for more number of epochs. To preserve the original topology of the semantic space, [21], [22] proposed using a graph convolution network (GCN) [23]. While in [22] the authors use a GCN directly on the semantic graph, in [21] the authors create a fully-connected graph whose edge weights correspond to the semantic distances and the node features comprise of classwise visual features.…”
Section: Introductionmentioning
confidence: 99%
“…To preserve the original topology of the semantic space, [21], [22] proposed using a graph convolution network (GCN) [23]. While in [22] the authors use a GCN directly on the semantic graph, in [21] the authors create a fully-connected graph whose edge weights correspond to the semantic distances and the node features comprise of classwise visual features.…”
Section: Introductionmentioning
confidence: 99%