2019 IEEE International Conference on Image Processing (ICIP) 2019
DOI: 10.1109/icip.2019.8803650
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Graph-Rnns for Action Detection of Multiple Activities

Abstract: In this paper, we propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time. Our approach takes the temporal scene context as well as the relations of the actions of detected persons into account. While the temporal context is modeled by a temporal recurrent neural network (RNN), the relations of the actions are modeled by a graph RNN. Both networks are trained together and the proposed approach achieves state of the art resu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1
1
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 23 publications
0
4
0
Order By: Relevance
“…However, we compare our approach with the state-of-the-art for fully supervised action detection in Table 4. Our approach is competitive to fully supervised approaches [5,6,7,8]. When we train our approach with full supervision, we improve over SlowFast [10] by +1.1% mAP on the validation set.…”
Section: Resultsmentioning
confidence: 93%
See 3 more Smart Citations
“…However, we compare our approach with the state-of-the-art for fully supervised action detection in Table 4. Our approach is competitive to fully supervised approaches [5,6,7,8]. When we train our approach with full supervision, we improve over SlowFast [10] by +1.1% mAP on the validation set.…”
Section: Resultsmentioning
confidence: 93%
“…The graph connects all actors and we use a graph RNN to infer the action probabilities for each actor based on the spatial and temporal context. In our approach, we use the hierarchical Graph RNN (HGRNN) [7] where the features per node are obtained by ROI pooling over the 3D CNN feature maps. The HGRNN and 3D CNN are learned using the MIML loss (1).…”
Section: Actor-action Associationmentioning
confidence: 99%
See 2 more Smart Citations