GPS-Net: Graph Property Sensing Network for Scene Graph Generation

Lin, Xin; Ding, Changxing; Zeng, Jinquan; Tao, Dacheng

doi:10.1109/cvpr42600.2020.00380

Cited by 176 publications

(150 citation statements)

References 54 publications

Supporting

Mentioning

150

Contrasting

Order By: Relevance

“…This bias arrives from the long-tailed relationship distribution. The GPS-Net [21] tackled this problem with FS and BA which worked well compared to the previous works. The overall performance of the model could be improved as well as improvements in mean Recall@K were achieved, which gives reasoning about the positive effect of their approach in handling the dataset bias.…”

Section: Related Workmentioning

confidence: 86%

“…In the less common two-stage approach [33,10,4,33], attributes of the scene graph are used in the second training step to refine the results produced by the first stage. Much more common are the one-stage approaches [4,45,5,37,39,21,18,22,17,24] which focus only on object detection and relationship classification, while almost neglecting intrinsic features. The proposed BGT-Net follows a one step approach and has the following advantages as compared to the literature work: (1) It uses object-object communication which improves the performance in SGG;…”

Section: Related Workmentioning

confidence: 99%

“…To handle the long-tailed relationship distribution present in the Visual Genome dataset, the procedure of softening this distribution and adapting the bias term for every subjectobject pair form [21] is adopted. The used features in this step is different than in the GPS-Net [21] , but the principle stays the same. The softening of the relationship distribution is done by applying a log-softmax function to the original distribution.…”

Section: Frequency Softening(fs) Bias Adaptation(ba)mentioning

confidence: 99%

“…In this procedure, a log-softmax function is applied to the subjectobject pairwise relationship distribution. Following this Frequency Softening (FS), a Bias Adaptation (BA) approach [21] is used. The bias for every subject-object is controlled by the bias adaptation term which takes scenespecific inputs to vary the amount of added bias.…”

Section: Introductionmentioning

confidence: 99%

“…(3) For every object, a second transformer encoder is used to gather information for the edges. (4) To tackle the bias in the relationship distribution, FS and BA [21] is adopted. The evaluation efficacy of the proposed BGT-Net is performed on three SGG datasets: Visual Genome (VG) [13], OpenImages (OI) [14], and Visual Relationship Detection (VRD) [22].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation

Dhingra

Ritter

Kunz

2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

Scene graphs are nodes and edges consisting of objects and object-object relationships, respectively. Scene graph generation (SGG) aims to identify the objects and their relationships. We propose a bidirectional GRU (BiGRU) transformer network (BGT-Net) for the scene graph generation for images. This model implements novel object-object communication to enhance the object information using a BiGRU layer. Thus, the information of all objects in the image is available for the other objects, which can be leveraged later in the object prediction step. This object information is used in a transformer encoder to predict the object class as well as to create object-specific edge information via the use of another transformer encoder. To handle the dataset bias induced by the long-tailed relationship distribution, softening with a log-softmax function and adding a bias adaptation term to regulate the bias for every relation prediction individually showed to be an effective approach.We conducted an elaborate study on experiments and ablations using open-source datasets, i.e., Visual Genome, Open-Images, and Visual Relationship Detection datasets, demonstrating the effectiveness of the proposed model over state of the art.

show abstract

Section: Related Workmentioning

confidence: 86%