Proceedings of the 27th ACM International Conference on Multimedia 2019
DOI: 10.1145/3343031.3350995
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Semantic-Visual Tree for Hierarchical Embeddings

Abstract: Merchandise categories inherently form a semantic hierarchy with different levels of concept abstraction, especially for fine-grained categories. This hierarchy encodes rich correlations among various categories across different levels, which can effectively regularize the semantic space and thus make prediction less ambiguous. However, previous studies of fine-grained image retrieval primarily focus on semantic similarities or visual similarities. In real application, merely using visual similarity may not sa… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 21 publications
(13 citation statements)
references
References 29 publications
(36 reference statements)
0
13
0
Order By: Relevance
“…Vision-Language Learning Aligning image and text into a common feature space has always been an active research topic (Frome et al, 2013;Joulin et al, 2016;Desai & Johnson, 2021;Yang et al, 2019;. Recently, contrastive language-image pre-training model CLIP (Radford et al, 2021c) demonstrates surprising ability to zero-shot transfer on downstream classification vision tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Vision-Language Learning Aligning image and text into a common feature space has always been an active research topic (Frome et al, 2013;Joulin et al, 2016;Desai & Johnson, 2021;Yang et al, 2019;. Recently, contrastive language-image pre-training model CLIP (Radford et al, 2021c) demonstrates surprising ability to zero-shot transfer on downstream classification vision tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Design Choices. The formulation of CCL is simple and largely inspired by the widely used contrastive loss [8,38] in the computer vision tasks, such as face recognition and image retrieval. But we make several design choices that differ from most widely-used loss functions in CF and greatly facilitate model training.…”
Section: Cosine Contrastive Lossmentioning
confidence: 99%
“…Towards this goal, we systematically compare multiple commonly-used loss functions and also investigate the impact of negative sampling ratio on each loss function. Moreover, inspired by the widely used contrastive loss [8,38] in computer vision, we propose a cosine contrastive loss (CCL) tailored for CF. Our CCL loss optimizes the embedding by maximizing the cosine similarity of a positive useritem pair, while minimizing the similarity of a negative pair to a certain margin.…”
Section: Introductionmentioning
confidence: 99%
“…Hierarchical knowledge. Hierarchical information has been validated useful for many tasks (Chao et al 2019;Yang et al 2019;Bugatti, Saito, and Davis 2019;Chen et al 2019b). But there are only very few works utilizing hierarchical information on the relationship detection related task.…”
Section: Related Workmentioning
confidence: 99%