2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00207
|View full text |Cite
|
Sign up to set email alerts
|

Scene Graph Generation With External Knowledge and Image Reconstruction

Abstract: Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction, etc. However, existing datasets are biased in terms of object and relationship labels, or often come with noisy and missing annotations, which makes the development of a reliable scene graph prediction model very challenging. In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction lo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
205
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 293 publications
(205 citation statements)
references
References 41 publications
0
205
0
Order By: Relevance
“…Vision and language are two important aspects of human intelligence to understand the real world. A large amount of research [5,9,23] has been done to bridge these two modalities. Image-text matching is one of the fundamental topics in this field, which refers to measuring the visualsemantic similarity between a sentence and an image.…”
Section: Introductionmentioning
confidence: 99%
“…Vision and language are two important aspects of human intelligence to understand the real world. A large amount of research [5,9,23] has been done to bridge these two modalities. Image-text matching is one of the fundamental topics in this field, which refers to measuring the visualsemantic similarity between a sentence and an image.…”
Section: Introductionmentioning
confidence: 99%
“…Through our experiments on Visual Genome [30], a dataset containing visual relationship data, we show that the object representations generated by the predicate functions result in meaningful features that can be used to enable few-shot scene graph prediction, exceeding existing transfer learning approaches by 4.16 at recall@1 with 5 labelled examples. We further justify our design decisions by demonstrating that our scene graph model performs on par with existing state-of-the-art models and even outperforms models that also do not utilize external knowledge bases [18], linguistic priors [39,58] or rely on complicated pre-and post-processing heuristics [58,6]. We run ablations where we remove the semantic or spatial components of our functions and demonstrate that both components lead to increased performance but the semantic component is responsible for most of the performance.…”
Section: Introductionmentioning
confidence: 88%
“…This includes Iterative Message Passing (IMP) [55], Multi-level scene Description Network (MSDN) [35], ViP-CNN [33], MotifNet-freq [58]. The second category includes models such as Factorizable Net [34], KB-GAN [18] and MotifNet [58], which use linguistic priors in the form of word vectors or external information from knowledge bases while MotifNet also deploys a custom trained object detector, class-conditioned non-maximum suppression, and heuristically removes all object pairs that do not overlap. While not comparable, we report their numbers for clarity.…”
Section: Baselinesmentioning
confidence: 99%
“…Scene segmentation (or scene parsing, semantic segmentation) is one of the fundamental problems in computer vision and has drawn lots of attentions. Recently, thanks to the great success of Convolutional Neural Networks (CNNs) in computer vision [42,68,71,52,25,72,27,80,26], lots of CNNs based segmentation works have been proposed and have achieved great progress [29,22,81,83,84,70,60]. For example, Long et al [54] introduce the fully convolutional networks (FCN) in which the fully connected layers in standard CNNs are transformed to convolutional layers.…”
Section: Related Work 21 Scene Segmentationmentioning
confidence: 99%