Scene Graph Embeddings Using Relative Similarity Supervision

Maheshwari, Paridhi; Chaudhry, Ritwick; Vinay, V.

doi:10.1609/aaai.v35i3.16333

Cited by 8 publications

(6 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Due to the lack of ground truth for this task, we use common metrics that are used in image collection scene-graph summarization tasks [2], [19]; similarity [16], [49], [50], coverage [28], [51], and diversity [52], [53] of a generated scene graph to the ground-truth scene graph of each image. However, most evaluation techniques focus on estimating the generating precision, in which the evaluation score tends to increase based on the quantity of the generated results.…”

Section: Evaluation Processmentioning

confidence: 99%

“…As such, we introduce an evaluation process which focuses on evaluating the quality of a summarized scene graph using F-score based on estimating the similarity between scene graphs. Since the estimation of the similarity between scene graphs has been attempted with various approaches, the technique of using word embedding shows a better qualitative estimation in scene-graph generation [50].…”

Section: Evaluation Processmentioning

confidence: 99%

“…For multiple-image scene-graph summarization, we evaluate the proposed method for image-collection scene-graph summarization on the MS-COCO dataset. Due to the lack of ground truth, we follow the common practice in the evaluation of scene graph generation in three perspectives; ''Coverage'' [28], [51], ''Diversity'' [52], [53], and ''Similarity'' [49], [50]. For the Coverage evaluation, we follow the graph theory to estimate the coverage of a generated scene graph to ground-truth scene graphs.…”

Section: ) Multiple-images Scene-graph Summarizationmentioning

confidence: 99%

See 2 more Smart Citations

Image-Collection Summarization Using Scene-Graph Generation With External Knowledge

Phueaksri,

Kastner,

Kawanishi

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Summarization tasks aim to summarize multiple pieces of information into a short description or representative information. A text summarization task is a task that summarizes textual information into a short description, whereas in an image collection summarization task, also known as the photo album summarization task, the goal is to find the representative visual information of all images in the collection. In recent years, scene-graph generation has shown the advantage of describing the visual contexts of a singleimage, and incorporating external knowledge into the scene-graph generation model has also given effective directions for unseen single-image scene-graph generation. Following this trend, in this paper, we propose a novel scene-graph-based image-collection summarization model. The key idea of the proposed method is to enhance the relation predictor toward relationships between images in an image collection incorporating knowledge graphs as external knowledge for training a model. To evaluate the proposed method, we build an extended annotated MS-COCO dataset for this task and introduce an evaluation process that focuses on estimating the similarity between a summarized scene graph and ground-truth scene graphs. Traditional evaluation focuses on calculating precision and recall scores, which involve true positive predictions without balancing precision and recall. Meanwhile, the proposed evaluation process focuses on calculating the Fscore of the similarity between a summarized scene graph and ground-truth scene graphs which aims to balance both false positives and false negatives. Experimental results show that the use of external knowledge in enhancing the relation predictor achieves better results compared with existing methods.

show abstract

Section: Evaluation Processmentioning

confidence: 99%

Section: Evaluation Processmentioning

confidence: 99%

Section: ) Multiple-images Scene-graph Summarizationmentioning

confidence: 99%

See 1 more Smart Citation

Image-Collection Summarization Using Scene-Graph Generation With External Knowledge

Phueaksri,

Kastner,

Kawanishi

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Contrastive learning approaches to representation learning have recently gained traction due to their success in several domains such as computer vision and natural language processing [6,8,19,23]. The intuition behind these approaches is to bring similar pairs of data points (typically referred to as the anchor and the positive) closer to each other than dissimilar pairs (anchor and negative) in an embedding space.…”

Section: Contrastive Learningmentioning

confidence: 99%

Generating Compositional Color Representations from Text

Maheshwari¹,

Jain²,

Vaddamanu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We consider the cross-modal task of producing color representations for text phrases. Motivated by the fact that a significant fraction of user queries on an image search engine follow an (attribute, object) structure, we propose a generative adversarial network that generates color profiles for such bigrams. We design our pipeline to learn composition - the ability to combine seen attributes and objects to unseen pairs. We propose a novel dataset curation pipeline from existing public sources. We describe how a set of phrases of interest can be compiled using a graph propagation technique, and then mapped to images. While this dataset is specialized for our investigations on color, the method can be extended to other visual dimensions where composition is of interest. We provide detailed ablation studies that test the behavior of our GAN architecture with loss functions from the contrastive learning literature. We show that the generative model achieves lower Fr\'echet Inception Distance than discriminative ones, and therefore predicts color profiles that better match those from real images. Finally, we demonstrate improved performance in image retrieval and classification, indicating the crucial role that color plays in these downstream tasks.

show abstract

“…To enable retrieval along specialized notions of image similarity, multiple image feature extractors have been developed. Some examples include shapes within content [22], co-occurrences of objects and their relationships [17], or styles [25]. We build on existing image representation methods (e.g.…”

Section: Introductionmentioning

confidence: 99%

Self-supervised Multi-view Disentanglement for Expansion of Visual Collections

Jain¹,

Vaddamanu²,

Maheshwari³

et al. 2023

Preprint

View full text Add to dashboard Cite

Figure 1: Left: query collection containing a set of images. Right: each row is a ranked list of images that match the query using three notions of image similarity -objects, style, color composition -each of which we refer to as a 'view'. The top row weighs the views equally. The bottom row (our approach) weighs each view proportional to the inferred intent of the query collection. This enhances relevance (along the primary view -objects) and diversity (along other views -style and color).

show abstract

Scene Graph Embeddings Using Relative Similarity Supervision

Cited by 8 publications

References 32 publications

Image-Collection Summarization Using Scene-Graph Generation With External Knowledge

Image-Collection Summarization Using Scene-Graph Generation With External Knowledge

Generating Compositional Color Representations from Text

Self-supervised Multi-view Disentanglement for Expansion of Visual Collections

Contact Info

Product

Resources

About