2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00053
|View full text |Cite
|
Sign up to set email alerts
|

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Abstract: Localizing persons and recognizing their actions from videos is a challenging task towards high-level video understanding. Recent advances have been achieved by modeling either "actor-actor" or "actorcontext" relations. However, such direct first-order relations are not sufficient for localizing actions in complicated scenes. Some actors might be indirectly related via objects or background context in the scene. Such indirect relations are crucial for determining the action labels but are mostly ignored by exi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
68
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 84 publications
(68 citation statements)
references
References 51 publications
0
68
0
Order By: Relevance
“…Maintaining feature memory bank for storing and utilizing representations along long term context has been demonstrated to be effective strategies for this task [25,20,16]. We also adapt the feature bank, which saves our pooled feature features and provides previously stored person features of timestamps within a long-range of current video clip.…”
Section: Memory Bankmentioning
confidence: 99%
See 3 more Smart Citations
“…Maintaining feature memory bank for storing and utilizing representations along long term context has been demonstrated to be effective strategies for this task [25,20,16]. We also adapt the feature bank, which saves our pooled feature features and provides previously stored person features of timestamps within a long-range of current video clip.…”
Section: Memory Bankmentioning
confidence: 99%
“…Spatio-temporal action localization aims to localize atomic actions of people in videos with 3D bounding boxes, which has attract large efforts in recent years [5,25,20,16,4,9]. Generally, there are two main factors showing fundamental influence on the performance of this task, i.e.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Action detection is a task that consists in detecting people and recognizing their actions along videos. Being fundamental to video understanding, action detection has gained attention in recent years [5], [2], [6], leading to remarkable advances.…”
Section: Introductionmentioning
confidence: 99%