2016
DOI: 10.1109/tip.2015.2507401
|View full text |Cite
|
Sign up to set email alerts
|

Learning of Multimodal Representations With Random Walks on the Click Graph

Abstract: In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. With the click data collected from the users' searching behavior, existing approaches take either one-to-one paired data (text-image pairs) or ranking examples (text-query-image and/or image-query-text ranking lists) as training examples, which do not make full use of the click data, particularly the implicit connections among the data objects. In this paper, we treat the clic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(26 citation statements)
references
References 27 publications
0
26
0
Order By: Relevance
“…It exploits user-image links to embed users and images into the same space so that they can be directly compared for image recommendation. In [36], a click graph is considered which contains images and text queries. The image-query edge indicates a click of an image given a query, where the click count serves as the edge weight.…”
Section: Heterogeneous Graphmentioning
confidence: 99%
See 2 more Smart Citations
“…It exploits user-image links to embed users and images into the same space so that they can be directly compared for image recommendation. In [36], a click graph is considered which contains images and text queries. The image-query edge indicates a click of an image given a query, where the click count serves as the edge weight.…”
Section: Heterogeneous Graphmentioning
confidence: 99%
“…DGK [93] graphlet kernel: random sampling [114] 2 nd (by graphlet) SkipGram (Eqs. 11-12 ) metapath2vec [46] meta-path based random walk 2 nd heterogeneous SkipGram ProxEmbed [44] truncate random walk node ranking tuples HSNL [29] truncate random walk 2 nd + QA ranking tuples LSTM RMNL [30] truncated random walk 2 nd + user-question quality ranking DeepCas [63] Markov chain based random walk information cascade sequence GRU MRW-MN [36] truncated random walk 2 nd + cross-modal feature difference DCNN+ SkipGram sequence of nodes as a fixed length vector, e.g., represent a sentence (i.e., a sequence of words) as one vector. LSTM is then adopted in such scenarios to embed a node sequence.…”
Section: Ge Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…Figure 1 (b) illustrates this point: there is only a single tag happiness and none for the concrete objects in the image. For the big search engines, one way of overcoming this bottleneck is to exploit query-and-click logs (e.g., [6,17,45]). The query keyword(s) associated with a click can be treated as label(s) for the clicked image.…”
Section: Introductionmentioning
confidence: 99%
“…Two, represent queries in a visual space by learning visual templates of the queries [19], [20], and compute the cross-media similarity by visual matching. Third, also the main stream, build a common latent space, e.g., by maximizing correlation between relevant image-query pairs [10], [11], [13], [21] or by minimizing a ranking based loss [7], [9], [12], [22]. Given the progress described above, an important question then is Q1.…”
mentioning
confidence: 99%