2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021
DOI: 10.1109/cvprw53098.2021.00376
|View full text |Cite
|
Sign up to set email alerts
|

ObjectGraphs: Using Objects and a Graph Convolutional Network for the Bottom-up Recognition and Explanation of Events in Video

Abstract: In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of an object detector (OD) on the frames, graphs are used to model the object relations and a graph convolutional network (GCN) is utilized to perform reasoning on the graphs. The resulting object-based frame-level features are then forwarded to a long short-term memory (LSTM) network for video event… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
32
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2

Relationship

5
1

Authors

Journals

citations
Cited by 10 publications
(33 citation statements)
references
References 29 publications
0
32
0
Order By: Relevance
“…The introduction of deep learning approaches has offered major performance leaps in video event recognition [5]- [14]. Most of these methods operate in a top-down fashion [6], [7], [10]- [14], i.e.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations
“…The introduction of deep learning approaches has offered major performance leaps in video event recognition [5]- [14]. Most of these methods operate in a top-down fashion [6], [7], [10]- [14], i.e.…”
Section: Introductionmentioning
confidence: 99%
“…Motivated by cognitive and psychological studies as described above, recent bottom-up action and event recognition approaches [5], [9] represent a video frame using not only features extracted from the entire frame but also features representing the main objects of the frame. More specifically, they utilize an object detector to derive a set of objects depicting semantically coherent regions of the video frames, a backbone network to derive a feature representation of these objects, and an attention mechanism combined with a graph neural network (GNN) to classify the video.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Graph Neural Networks (GNNs) have achieved state-of-the-art performance in learning over such relational data in various graph-based machine learning tasks, such as node classification, link prediction, and graph classification [9,20,29,61,68,69]. Due to their superior performance, GNNs are now widely used in many applications such as recommendation systems, credit issuing, traffic forecasting, drug discovery, and medical diagnosis [3,7,17,27,37,63].…”
Section: Introductionmentioning
confidence: 99%