2021
DOI: 10.1609/aaai.v35i3.16333
|View full text |Cite
|
Sign up to set email alerts
|

Scene Graph Embeddings Using Relative Similarity Supervision

Abstract: Scene graphs are a powerful structured representation of the underlying content of images, and embeddings derived from them have been shown to be useful in multiple downstream tasks. In this work, we employ a graph convolutional network to exploit structure in scene graphs and produce image embeddings useful for semantic image retrieval. Different from classification-centric supervision traditionally available for learning image representations, we address the task of learning from relative similarity labels i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…Due to the lack of ground truth for this task, we use common metrics that are used in image collection scene-graph summarization tasks [2], [19]; similarity [16], [49], [50], coverage [28], [51], and diversity [52], [53] of a generated scene graph to the ground-truth scene graph of each image. However, most evaluation techniques focus on estimating the generating precision, in which the evaluation score tends to increase based on the quantity of the generated results.…”
Section: Evaluation Processmentioning
confidence: 99%
See 2 more Smart Citations
“…Due to the lack of ground truth for this task, we use common metrics that are used in image collection scene-graph summarization tasks [2], [19]; similarity [16], [49], [50], coverage [28], [51], and diversity [52], [53] of a generated scene graph to the ground-truth scene graph of each image. However, most evaluation techniques focus on estimating the generating precision, in which the evaluation score tends to increase based on the quantity of the generated results.…”
Section: Evaluation Processmentioning
confidence: 99%
“…As such, we introduce an evaluation process which focuses on evaluating the quality of a summarized scene graph using F-score based on estimating the similarity between scene graphs. Since the estimation of the similarity between scene graphs has been attempted with various approaches, the technique of using word embedding shows a better qualitative estimation in scene-graph generation [50].…”
Section: Evaluation Processmentioning
confidence: 99%
See 1 more Smart Citation
“…Contrastive learning approaches to representation learning have recently gained traction due to their success in several domains such as computer vision and natural language processing [6,8,19,23]. The intuition behind these approaches is to bring similar pairs of data points (typically referred to as the anchor and the positive) closer to each other than dissimilar pairs (anchor and negative) in an embedding space.…”
Section: Contrastive Learningmentioning
confidence: 99%
“…To enable retrieval along specialized notions of image similarity, multiple image feature extractors have been developed. Some examples include shapes within content [22], co-occurrences of objects and their relationships [17], or styles [25]. We build on existing image representation methods (e.g.…”
Section: Introductionmentioning
confidence: 99%