Fully Convolutional Scene Graph Generation

Liu, Hengyue; Yan, Ning; Mortazavi, Masood; Bhanu, Bir

doi:10.1109/cvpr46437.2021.01138

Cited by 64 publications

(29 citation statements)

References 92 publications

(133 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…than another one-stage model FCSGG [60]. Our model is also competitive compared with recent two-stage models, and outperforms state-of-the-art visual-based methods.…”

Section: Visual Genomementioning

confidence: 63%

“…Compared to the boom of two-stage approaches, onestage approaches are still in their infancy and have the advantage of being simple, fast and easy to train. To the best of our knowledge, FCSGG [60] is currently the only one-stage scene graph generation framework that encodes objects as box center points and relationships as 2D vector fields. While FCSGG model being lightweight and fast speed, it has a significant performance gap compared to other twostage methods.…”

Section: Scene Graph Generationmentioning

confidence: 99%

“…Method 𝐴𝑃50 PredCLS SGCLS SGDET #params(M) FPS R@20 R@50 mR@20 mR@50 R@20 R@50 mR@20 mR@50 R@20 R@50 mR@20 mR@50 Note that the number of FCSGG is directly taken from [60] due to unavailable code.…”

Section: Set Prediction Loss For Triplet Detectionmentioning

confidence: 99%

“…We compare scores of R@𝐾 and mR@𝐾, number of parameters and inference speed on SGDET (FPS) with several two-stage models and one-stage model FCSGG [60] in Table 1. Models that not only use visual appearance, but also prior knowledge (e.g.…”

Section: Visual Genomementioning

confidence: 99%

See 3 more Smart Citations

RelTR: Relation Transformer for Scene Graph Generation

Yang¹,

Yang²,

Rosenhahn³

2022

Preprint

View full text Add to dashboard Cite

Different objects in the same scene are more or less related to each other, but only a limited number of these relationships are noteworthy. Inspired by DETR, which excels in object detection, we view scene graph generation as a set prediction problem and propose an end-to-end scene graph generation model RelTR which has an encoder-decoder architecture. The encoder reasons about the visual feature context while the decoder infers a fixed-size set of triplets subject-predicate-object using different types of attention mechanisms with coupled subject and object queries. We design a set prediction loss performing the matching between the ground truth and predicted triplets for the end-to-end training. In contrast to most existing scene graph generation methods, RelTR is a one-stage method that predicts a set of relationships directly only using visual appearance without combining entities and labeling all possible predicates. Extensive experiments on the Visual Genome and Open Images V6 datasets demonstrate the superior performance and fast inference of our model.

show abstract

“…than another one-stage model FCSGG [60]. Our model is also competitive compared with recent two-stage models, and outperforms state-of-the-art visual-based methods.…”

Section: Visual Genomementioning

confidence: 63%

Section: Scene Graph Generationmentioning

confidence: 99%

Section: Set Prediction Loss For Triplet Detectionmentioning

confidence: 99%

Section: Visual Genomementioning

confidence: 99%

See 2 more Smart Citations

RelTR: Relation Transformer for Scene Graph Generation

Yang¹,

Yang²,

Rosenhahn³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Scene graphs With the aforementioned goal, scene graphs, were pioneered by [4], where detected objects from input images are modeled as nodes and semantic relations between them are modeled as edges with semantic relational labels. Recently, more approaches [8], [19], [20], [21] are proposed on top of this and deliver state-of-the-art performance and efficiency on public benchmarks like Visual Genome [12]. Naturally, all of them rely on ground truth provided by the dataset for training deep scene graph generation models, and this holds for each new domain-specific task.…”

Section: Related Workmentioning

confidence: 99%