Video Visual Relation Detection via Iterative Inference

Shang, Xindi; Li, Yicong; Xiao, Junbin; Ji, Wei; Chua, Tat-Seng

doi:10.1145/3474085.3475263

Cited by 24 publications

(47 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For each video in VidVRD dataset, the model needs to predict a set of relation instances, and each relation instance contains a relation triplet with the subject and object trajectories. Following [29,28], we use two evaluation protocols on this dataset: relation detection and relation tagging. For relation detection, we count a predicted relation instance as a correct one, if its relation triplet is the same with a ground truth, and their trajectory vIoU (volume IoU) of the subject and object are both larger than the threshold of 0.5.…”

Section: Methodsmentioning

confidence: 99%

“…For relation detection, we count a predicted relation instance as a correct one, if its relation triplet is the same with a ground truth, and their trajectory vIoU (volume IoU) of the subject and object are both larger than the threshold of 0.5. In the same way as [29,28], we adopt Mean Average Precision (mAP), Recall@50 (R@50) and Recall@100 (R@100) to evaluate the model performance on relation detection. While in relation tagging, for a predicted relation instance, following [29,28] we only consider the correctness of its relation triplet, and ignore the precision of its subject and object trajectories.…”

Section: Methodsmentioning

confidence: 99%

“…In the same way as [29,28], we adopt Mean Average Precision (mAP), Recall@50 (R@50) and Recall@100 (R@100) to evaluate the model performance on relation detection. While in relation tagging, for a predicted relation instance, following [29,28] we only consider the correctness of its relation triplet, and ignore the precision of its subject and object trajectories. The evaluation metrics of Precision@1 (P@1), Precision@5 (P@5) and Precision@10 (P@10) are used in relation tagging [29,28].…”

Section: Methodsmentioning

confidence: 99%

“…Besides ImgSGG, there are also increasing research efforts exploring the task of video scene graph generation (VidSGG) [29,20,2,36]. This task provides two task settings based on the granularity of the generated video scene graphs: videolevel [29,37,24,28,20,2] and frame-level [36,5]. For video-level VidSGG, models generate scene graphs based on the video clip, where each node encodes the spatio-temporal trajectory of an object, and the connecting edge denotes the relation between two objects.…”

Section: Related Workmentioning

confidence: 99%

“…Shang et al [29] first investigated this problem setting, and proposed to extract improved Dense Trajectories features [39] for handling this problem. Later on, some other methods have been proposed to solve this video-level VidSGG problem from different perspectives, including the fully-connected spatio-temporal graph [37], and iterative relation inference [28]. For frame-level VidSGG, a scene graph is generated for each video frame [36,5].…”

Section: Related Workmentioning

confidence: 99%

See 4 more Smart Citations

Meta Spatio-Temporal Debiasing for Video Scene Graph Generation

Li¹,

Qu²,

Kuen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Video scene graph generation (VidSGG) aims to parse the video content into scene graphs, which involves modeling the spatiotemporal contextual information in the video. However, due to the longtailed training data in datasets, the generalization performance of existing VidSGG models can be affected by the spatio-temporal conditional bias problem. In this work, from the perspective of meta-learning, we propose a novel Meta Video Scene Graph Generation (MVSGG) framework to address such a bias problem. Specifically, to handle various types of spatio-temporal conditional biases, our framework first constructs a support set and a group of query sets from the training data, where the data distribution of each query set is different from that of the support set w.r.t. a type of conditional bias. Then, by performing a novel meta training and testing process to optimize the model to obtain good testing performance on these query sets after training on the support set, our framework can effectively guide the model to learn to well generalize against biases. Extensive experiments demonstrate the efficacy of our proposed framework.

show abstract

Section: Methodsmentioning

confidence: 99%