2019
DOI: 10.48550/arxiv.1905.13339
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multitask Text-to-Visual Embedding with Titles and Clickthrough Data

Abstract: Text-visual (or called semantic-visual) embedding is a central problem in vision-language research. It typically involves mapping of an image and a text description to a common feature space through a CNN image encoder and a RNN language encoder. In this paper, we propose a new method for learning text-visual embedding using both image titles and click-through data from an image search engine. We also propose a new triplet loss function by modeling positive awareness of the embedding, and introduce a novel min… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 8 publications
(8 reference statements)
0
2
0
Order By: Relevance
“…For each text caption (anchor text) and (positive) image pair, we mine a hard negative sample within a training mini-batch using the online negative sampling strategy from [2]. We treat the caption corresponding to the negative image as the hard negative text.…”
Section: Training Strategymentioning
confidence: 99%
See 1 more Smart Citation
“…For each text caption (anchor text) and (positive) image pair, we mine a hard negative sample within a training mini-batch using the online negative sampling strategy from [2]. We treat the caption corresponding to the negative image as the hard negative text.…”
Section: Training Strategymentioning
confidence: 99%
“…For our experiments we see that when 𝜌 = 4, 𝛼 1 = 0.5 and 𝛼 2 = 1, we get the best results. To confirm its efficiency, we compare our results with another metric learning loss called "Positive Aware Triplet Ranking Loss (PATR)" [2] which performs a similar task without negative text.…”
Section: Training Strategymentioning
confidence: 99%