Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Pan, Jwo; Chen, Siyu; Shou, Mike Zheng; Yu, Liu; Shao, Jing; Li, Hongsheng

doi:10.1109/cvpr46437.2021.00053

Cited by 84 publications

(68 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Maintaining feature memory bank for storing and utilizing representations along long term context has been demonstrated to be effective strategies for this task [25,20,16]. We also adapt the feature bank, which saves our pooled feature features and provides previously stored person features of timestamps within a long-range of current video clip.…”

Section: Memory Bankmentioning

confidence: 99%

“…Spatio-temporal action localization aims to localize atomic actions of people in videos with 3D bounding boxes, which has attract large efforts in recent years [5,25,20,16,4,9]. Generally, there are two main factors showing fundamental influence on the performance of this task, i.e.…”

Section: Introductionmentioning

confidence: 99%

“…The design of video networks has been widely studied [4,19,3] and greatly enhance the performance of downstream tasks. Besides, pretraining such networks on large-scale networks is also demonstrated to be effective [20,16], e.g. pretrain on Ki-netics700 [2].…”

Section: Introductionmentioning

confidence: 99%

“…For relation modeling, different approaches has been studied in the fields of computer vision [23,1], social networks [12,10] and nature language processing [22]. Specifically, transformed-based relation modeling has been proved for improving the spatio-temporal localization task [25,20,16].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Relation Modeling in Spatio-Temporal Action Localization

Feng,

Jiang,

Huang

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper presents our solution to the AVA-Kinetics Crossover Challenge of ActivityNet workshop at CVPR 2021. Our solution utilizes multiple types of relation modeling methods for spatio-temporal action detection and adopts a training strategy to integrate multiple relation modeling in end-to-end training over the two large-scale video datasets. Learning with memory bank and finetuning for long-tailed distribution are also investigated to further improve the performance. In this paper, we detail the implementations of our solution and provide experiments results and corresponding discussions. We finally achieve 40.67 mAP on the test set of AVA-Kinetics.

show abstract

Section: Memory Bankmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Relation Modeling in Spatio-Temporal Action Localization

Feng,

Jiang,

Huang

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Action detection is a task that consists in detecting people and recognizing their actions along videos. Being fundamental to video understanding, action detection has gained attention in recent years [5], [2], [6], leading to remarkable advances.…”

Section: Introductionmentioning

confidence: 99%

Spatio-Temporal Context for Action Detection

Calderó,

Varas,

Bou-Balust

2021

Preprint

View full text Add to dashboard Cite

Research in action detection has grown in the recent years, as it plays a key role in video understanding. Modelling the interactions (either spatial or temporal) between actors and their context has proven to be essential for this task. While recent works use spatial features with aggregated temporal information, this work proposes to use nonaggregated temporal information. This is done by adding an attention based method that leverages spatio-temporal interactions between elements in the scene along the clip.The main contribution of this work is the introduction of two cross attention blocks to effectively model the spatial relations and capture short range temporal interactions.Experiments on the AVA dataset show the advantages of the proposed approach that models spatio-temporal relations between relevant elements in the scene, outperforming other methods that model actor interactions with their context by +0.31 mAP.

show abstract

Human‐centered attention‐aware networks for action recognition

Liu

2022

Int J of Intelligent Sys

View full text Add to dashboard Cite

Action recognition in video is a research hot spot in the field of computer vision. Learning important clues in video context has significant effect to promote the interaction prediction and gesture recognition. Most existing methods infer the interactions between actor and context through relational reasoning methods. While these relational features contribute to improve the salience of action performance, the error will occur when the salient region is irrelevant to the recognized action. Therefore, this paper establishes a human‐centered attention mechanism that dynamically highlights regions associated with action recognition according to target appearance to selectively recognize the human‐object interaction action. The effectiveness of the proposed mechanism is verified on the AVA2.2 data set, and the visualized attention map further shows that the proposed attention mechanism can effectively recognize human‐centered strongly correlated action.

show abstract

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Cited by 84 publications

References 51 publications

Relation Modeling in Spatio-Temporal Action Localization

Relation Modeling in Spatio-Temporal Action Localization

Spatio-Temporal Context for Action Detection

Human‐centered attention‐aware networks for action recognition

Contact Info

Product

Resources

About