Adaptive Semantic-Visual Tree for Hierarchical Embeddings

Yang, Simon X.; Yu, Wei; Zheng, Ying; Yao, Hongxun; Mei, Tao

doi:10.1145/3343031.3350995

Cited by 21 publications

(13 citation statements)

References 29 publications

(36 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Vision-Language Learning Aligning image and text into a common feature space has always been an active research topic (Frome et al, 2013;Joulin et al, 2016;Desai & Johnson, 2021;Yang et al, 2019;. Recently, contrastive language-image pre-training model CLIP (Radford et al, 2021c) demonstrates surprising ability to zero-shot transfer on downstream classification vision tasks.…”

Section: Related Workmentioning

confidence: 99%

Objects in Semantic Topology

Yang¹,

Sun²,

Jiang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

A more realistic object detection paradigm, Open-World Object Detection, has arisen increasing research interests in the community recently. A qualified openworld object detector can not only identify objects of known categories, but also discover unknown objects, and incrementally learn to categorize them when their annotations progressively arrive. Previous works rely on independent modules to recognize unknown categories and perform incremental learning, respectively. In this paper, we provide a unified perspective: Semantic Topology. During the life-long learning of an open-world object detector, all object instances from the same category are assigned to their corresponding pre-defined node in the semantic topology, including the 'unknown' category. This constraint builds up discriminative feature representations and consistent relationships among objects, thus enabling the detector to distinguish unknown objects out of the known categories, as well as making learned features of known objects undistorted when learning new categories incrementally. Extensive experiments demonstrate that semantic topology, either randomly-generated or derived from a well-trained language model, could outperform the current state-of-the-art open-world object detectors by a large margin, e.g., the absolute open-set error is reduced from 7832 to 2546, exhibiting the inherent superiority of semantic topology on open-world object detection.

show abstract

Section: Related Workmentioning

confidence: 99%

Objects in Semantic Topology

Yang¹,

Sun²,

Jiang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Design Choices. The formulation of CCL is simple and largely inspired by the widely used contrastive loss [8,38] in the computer vision tasks, such as face recognition and image retrieval. But we make several design choices that differ from most widely-used loss functions in CF and greatly facilitate model training.…”

Section: Cosine Contrastive Lossmentioning

confidence: 99%

“…Towards this goal, we systematically compare multiple commonly-used loss functions and also investigate the impact of negative sampling ratio on each loss function. Moreover, inspired by the widely used contrastive loss [8,38] in computer vision, we propose a cosine contrastive loss (CCL) tailored for CF. Our CCL loss optimizes the embedding by maximizing the cosine similarity of a positive useritem pair, while minimizing the similarity of a negative pair to a certain margin.…”

Section: Introductionmentioning

confidence: 99%

SimpleX: A Simple and Strong Baseline for Collaborative Filtering

Mao

Zhu

Wang

et al. 2021

Preprint

View full text Add to dashboard Cite

Collaborative filtering (CF) is a widely studied research topic in recommender systems. The learning of a CF model generally depends on three major components, namely interaction encoder, loss function, and negative sampling. While many existing studies focus on the design of more powerful interaction encoders, the impacts of loss functions and negative sampling ratios have not yet been well explored. In this work, we show that the choice of loss function as well as negative sampling ratio is equivalently important. More specifically, we propose the cosine contrastive loss (CCL) and further incorporate it to a simple unified CF model, dubbed SimpleX. Extensive experiments have been conducted on 11 benchmark datasets and compared with 29 existing CF models in total. Surprisingly, the results show that, under our CCL loss and a large negative sampling ratio, SimpleX can surpass most sophisticated state-of-the-art models by a large margin (e.g., max 48.5% improvement in NDCG@20 over LightGCN). We believe that SimpleX could not only serve as a simple strong baseline to foster future research on CF, but also shed light on the potential research direction towards improving loss function and negative sampling.

show abstract

“…Hierarchical knowledge. Hierarchical information has been validated useful for many tasks (Chao et al 2019;Yang et al 2019;Bugatti, Saito, and Davis 2019;Chen et al 2019b). But there are only very few works utilizing hierarchical information on the relationship detection related task.…”

Section: Related Workmentioning

confidence: 99%

Exploring the Hierarchy in Relation Labels for Scene Graph Generation

Zhou,

Sun,

Zhang

et al. 2020

Preprint

View full text Add to dashboard Cite

By assigning each relationship a single label, current approaches formulate the relationship detection as a classification problem. Under this formulation, predicate categories are treated as completely different classes. However, different from the object labels where different classes have explicit boundaries, predicates usually have overlaps in their semantic meanings. For example, sit on and stand on have common meanings in vertical relationships but different details of how these two objects are vertically placed. In order to leverage the inherent structures of the predicate categories, we propose to first build the language hierarchy and then utilize the Hierarchy Guided Feature Learning (HGFL) strategy to learn better region features of both the coarse-grained level and the fine-grained level. Besides, we also propose the Hierarchy Guided Module (HGM) to utilize the coarse-grained level to guide the learning of fine-grained level features. Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin (up to 33% relative gain) in terms of Recall@50 on the task of Scene Graph Generation in different datasets.

show abstract

Adaptive Semantic-Visual Tree for Hierarchical Embeddings

Cited by 21 publications

References 29 publications

Objects in Semantic Topology

Objects in Semantic Topology

SimpleX: A Simple and Strong Baseline for Collaborative Filtering

Exploring the Hierarchy in Relation Labels for Scene Graph Generation

Contact Info

Product

Resources

About