Sampled Image Tagging and Retrieval Methods on User Generated Content

Ni, Karl; Zaragoza, Kyle; Gude, A.; Tesfaye, Yonas; Carrano, Carmen J.; Foster, C. Stephen; Chen, Barry

doi:10.5244/c.31.104

Cited by 1 publication

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fast0Tag [14] projects an image by identifying a principal direction in the space and targeting that principal direction when learning to project the image. [15] uses noise contrastive estimation on a noisy web-scale dataset [16] to learn projection from image to word embeddings space. VSE++ [17] proposes a modified pairwise ranking loss weighted by violation caused by hard-negatives.…”

Section: Image-text Retrievalmentioning

confidence: 99%

From Intra-Modal to Inter-Modal Space: Multi-task Learning of Shared Representations for Cross-Modal Retrieval

Choi

Larson

Friedland

et al. 2019

2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM)

View full text Add to dashboard Cite

Learning a robust shared representation space is critical for effective multimedia retrieval, and is increasingly important as multimodal data grows in volume and diversity. The labeled datasets necessary for learning such a space are limited in size and also in coverage of semantic concepts. These limitations constrain performance: a shared representation learned on one dataset may not generalize well to another. We address this issue by building on the insight that, given limited data, it is easier to optimize the semantic structure of a space within a modality, than across modalities. We propose a two-stage shared representation learning framework with intra-modal optimization and subsequent cross-modal transfer learning of semantic structure that produces a robust shared representation space. We integrate multi-task learning into each step, making it possible to leverage multiple datasets, annotated with different concepts, as if they were one large dataset. Large-scale systematic experiments demonstrate improvements over previously reported state-ofthe-art methods on cross-modal retrieval tasks.

show abstract

Section: Image-text Retrievalmentioning

confidence: 99%