2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023
DOI: 10.1109/wacv56688.2023.00014
|View full text |Cite
|
Sign up to set email alerts
|

Composite Relationship Fields with Transformers for Scene Graph Generation

Abstract: Scene graph generation (SGG) methods extract relationships between objects. While most methods focus on improving top-down approaches, which build a scene graph based on detected objects from an off-the-shelf object detector, there is a limited amount of work on bottom-up approaches, which jointly detect objects and their relationships in a single stage.In this work, we present a novel bottom-up SGG approach by representing relationships using Composite Relationship Fields (CoRF). CoRF turns relationship detec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(9 citation statements)
references
References 90 publications
(114 reference statements)
0
9
0
Order By: Relevance
“…Instead of pooling features of various shapes, extracting features at multiple pixels is much faster and consumes less memory. Several works [17], [18], [19] explore such point-based entity representation for SGG. Pixel2Graph [17] grounds edges at the midpoints between the bounding box centers of subjects and objects (referred to as subject and object centers for the rest of the paper).…”
Section: Feature Representationsmentioning
confidence: 99%
See 4 more Smart Citations
“…Instead of pooling features of various shapes, extracting features at multiple pixels is much faster and consumes less memory. Several works [17], [18], [19] explore such point-based entity representation for SGG. Pixel2Graph [17] grounds edges at the midpoints between the bounding box centers of subjects and objects (referred to as subject and object centers for the rest of the paper).…”
Section: Feature Representationsmentioning
confidence: 99%
“…2) Point-based: single-pixel features extracted from bounding box centers. These methods [17], [18], [19] utilize anchor-free detectors [11], [13], [14] to ground entities and relationships in a regression fashion. 3) Query-based: fixed-size learnable embeddings.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations