2020
DOI: 10.1007/978-3-030-58601-0_33
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Offline Quintuplet Loss for Image-Text Matching

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 56 publications
(30 citation statements)
references
References 26 publications
0
30
0
Order By: Relevance
“…We choose the latest work in past two years as baseline methods for comparison with Global Relation-aware Attention Network (GRAN). Including SCAN [19], ACMNet [5], CASC [6], DP-RNN [9], MMCA [39], CAAN [44], IMRAM [7], AAMEL [38], SMAN [16], M3A-Net [15] which use cross-related methods; SGM [35], Guo et al [12] which use GCN [18]; Polynomial Loss [37], AMF [26], Chen et al [8] which introduce new loss; and TERAN [24] which uses transformer. @ ( = 1, 5, 10) is adopted to evaluate the cross-modal retrieval performance of all methods.…”
Section: Baseline Methods and Evaluation Metricsmentioning
confidence: 99%
“…We choose the latest work in past two years as baseline methods for comparison with Global Relation-aware Attention Network (GRAN). Including SCAN [19], ACMNet [5], CASC [6], DP-RNN [9], MMCA [39], CAAN [44], IMRAM [7], AAMEL [38], SMAN [16], M3A-Net [15] which use cross-related methods; SGM [35], Guo et al [12] which use GCN [18]; Polynomial Loss [37], AMF [26], Chen et al [8] which introduce new loss; and TERAN [24] which uses transformer. @ ( = 1, 5, 10) is adopted to evaluate the cross-modal retrieval performance of all methods.…”
Section: Baseline Methods and Evaluation Metricsmentioning
confidence: 99%
“…Our model modifies the conventional triplet [15] network architecture with multi-instance inputs and defines a custom loss function. There have been previous efforts in using generic n-tuple inputs [24,25] and a variety of loss functions such as contrastive loss [26], triplet-center loss [27], lifted loss [28], histogram loss [29], multi-similarity loss [30] and circle loss [31] have been explored before. While we share with these models the general intention of designing an objective function that assigns larger weights to informative inputs, our work differs with its focus on introducing different notions of similarity rather than just improving pair selection strategy.…”
Section: Related Workmentioning
confidence: 99%
“…We start by formally introducing the standard contrastive learning framework commonly used in previous works (Lee et al, 2018;Chen et al, 2020b)…”
Section: Contrastive Learningmentioning
confidence: 99%
“…L i−t corresponds to image-to-text retrieval, while L t−i corresponds to text-to-image retrieval (or image search). Common negative sampling strategy includes selecting all the negatives (Huang et al, 2017), selecting hard negatives of highest similarity scores in the mini-batch (Faghri et al, 2018), and selecting hard negatives from the whole training data (Chen et al, 2020b). Minimizing the marginbased triplet loss will make positive image-text pairs closer to each other than other negative samples in the joint embedding space.…”
Section: Contrastive Learningmentioning
confidence: 99%