Improving Visual Relationship Detection With Two-Stage Correlation Exploitation

Zhou, Hao; Zhang, Chongyang; Zhao, Muming; Luo, Yan; Hu, Chuanping

doi:10.1109/tcsvt.2020.3032650

Cited by 7 publications

(10 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 3 provides a detailed comparison with the existing state-of-the-art relation detection model and uses the same metrics to evaluate the recognition rate of our method. Compared with the best model TCE [14], our method is nearly 1 percentage point higher. PTAT (ours) outperforms the current state-of-the-art method, TCE, e.g., 73.65% vs. 72.20% for R@100, k=1 in predicate detection.…”

Section: ) Experiments On Visual Genomementioning

confidence: 80%

“…Liu et al [1] proposed two novel modules to discover the common distribution space and the latent relationship association, which map pairs of object features into translation subspaces to induce discriminative relationship clustering. Wang et al [14] proposed a fast method for VRD based on recurrent attention and negative sampling that integrated the attention mechanism into the detection pipeline, enabling the network to focus on several specific parts of an image when scoring predicates for a given object pair. Chiou et al [8] imitated human reasoning mechanisms to propose the RVL-BERT model, which learned visual and language commonsense knowledge via self-supervised pretraining to perform relational reasoning.…”

Section: Related Work a Relationship Detectionmentioning

confidence: 99%

“…The existing method with a higher recognition rate is TCE [14]. Using the two-stage correlation exploitation method, the first-stage model is used to detect whether there are some relationships between pairwise objects, and the most probability distribution of the relationship is obtained.…”

Section: ) Experiments On Vrdmentioning

confidence: 99%

“…The probability scores are ranked in descending order and the top n relationships are input into the secondstage model to predict predicates between the pairwise objects [25], [26]. Through the two-stage correlation exploitation, the number of relationships between two objects can be effectively reduced, but two models need to be trained with supervision [14]. However, the calculation example is dozens of times more than our proposed method.…”

Section: ) Experiments On Vrdmentioning

confidence: 99%

See 3 more Smart Citations

Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning

Xiao

2022

IEEE Access

View full text Add to dashboard Cite

show abstract

Section: ) Experiments On Visual Genomementioning

confidence: 80%

Section: Related Work a Relationship Detectionmentioning

confidence: 99%

Section: ) Experiments On Vrdmentioning

confidence: 99%

Section: ) Experiments On Vrdmentioning

confidence: 99%

See 2 more Smart Citations

Optimizing Continuous Prompts for Visual Relationship Detection by Affix-Tuning

Xiao

2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Although the existing methods have achieved superior performance in relationship detection works, there are still two key dilemmas in this field, including combination explosion and non-exclusive label problems, as follows. (1) The combination explosion problem: prior works [ 11 ] follow the naive proposing method that if it extracts N objects from an image, there are N(N-1) object-pairs in the object-pair proposal state based on N detected objects. Even worse, multiple correlated relationships usually exist between two objects, and we tend to reserve more visual relationship triplets so that the combinations grow explosively.…”

Section: Introductionmentioning

confidence: 99%

Visual Relationship Detection with Multimodal Fusion and Reasoning

Xiao

2022

Sensors

View full text Add to dashboard Cite

Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and infer covert states using common sense. Therefore, these methods cannot predict some hidden relationships of object-pairs from complex scenes. To address this problem, we propose unifying vision–language fusion and knowledge graph reasoning to combine visual feature embedding with external common sense knowledge to determine the visual relationships of objects. In addition, before training the relationship detection network, we devise an object–pair proposal module to solve the combination explosion problem. Extensive experiments show that our proposed method outperforms the state-of-the-art methods on the Visual Genome and Visual Relationship Detection datasets.

show abstract