“…In the less common two-stage approach [33,10,4,33], attributes of the scene graph are used in the second training step to refine the results produced by the first stage. Much more common are the one-stage approaches [4,45,5,37,39,21,18,22,17,24] which focus only on object detection and relationship classification, while almost neglecting intrinsic features. The proposed BGT-Net follows a one step approach and has the following advantages as compared to the literature work: (1) It uses object-object communication which improves the performance in SGG;…”