Detecting Events and Key Actors in Multi-person Videos

Ramanathan, Vignesh; Huang, Jonathan; Abu-El-Haija, Sami; Gorban, Alexander N.; Murphy, Kevin; Li, Feifei

doi:10.1109/cvpr.2016.332

Cited by 173 publications

(180 citation statements)

References 66 publications

Supporting

Mentioning

179

Contrasting

Order By: Relevance

“…The earlier approaches are mostly based on a combination of hand-crafted visual features with probability graphical models [1,31,30,43,6,8,17] or AND-OR grammar models [2,46]. Recently, the wide adoption of deep convolutional neural networks (CNNs) has demonstrated significant performance improvements on group activity recognition [3,24,41,45,12,32,59,23,39]. Ibrahim et al [24] designed a two-stage deep temporal model, which builds a LSTM model to represent action dynamics of individual people and another LSTM model to aggregate personlevel information.…”

Section: Related Workmentioning

confidence: 99%

Learning Actor Relation Graphs for Group Activity Recognition

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

201

183

View full text Add to dashboard Cite

Modeling relation between actors is important for recognizing group activity in a multi-person scene. This paper aims at learning discriminative relation between actors efficiently using deep models. To this end, we propose to build a flexible and efficient Actor Relation Graph (ARG) to simultaneously capture the appearance and position relation between actors. Thanks to the Graph Convolutional Network, the connections in ARG could be automatically learned from group activity videos in an end-toend manner, and the inference on ARG could be efficiently performed with standard matrix operations. Furthermore, in practice, we come up with two variants to sparsify ARG for more effective modeling in videos: spatially localized ARG and temporal randomized ARG. We perform extensive experiments on two standard group activity recognition datasets: the Volleyball dataset and the Collective Activity dataset, where state-of-the-art performance is achieved on both datasets. We also visualize the learned actor graphs and relation features, which demonstrate that the proposed ARG is able to capture the discriminative relation information for group activity recognition. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Actor Relation Graphs for Group Activity Recognition

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

201

183

View full text Add to dashboard Cite

show abstract

“…The idea of using RNNs for group activity recognition started with [13] that uses Long Short-Term Memory (LSTM) networks to model individual persons and pools the representations from them to a specific LSTM for modeling group activity. In [20], attention pooling is utilized to give higher importance to key actors. Person-centered features are introduced in [24] as input to a hierarchical LSTM.…”

Section: Group Activity Recognitionmentioning

confidence: 99%

Convolutional Relational Machine for Group Activity Recognition

Azar

Atigh

Nickabadi

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

102

View full text Add to dashboard Cite

We present an end-to-end deep Convolutional Neural Network called Convolutional Relational Machine (CRM) for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video. It learns to produce an intermediate spatial representation (activity map) based on individual and group activities. A multi-stage refinement component is responsible for decreasing the incorrect predictions in the activity map. Finally, an aggregation component uses the refined information to recognize group activities. Experimental results demonstrate the constructive contribution of the information extracted and represented in the form of the activity map. CRM shows advantages over state-of-the-art models on Volleyball and Collective Activity datasets.

show abstract

“…We provide the details of each area below, Using general action recognition: Methods in this category use techniques from action recognition literature [7], [8], [13] to detect important events in sports videos. Ramanathan et al [14] proposed a model that learns to detect events and key actors in basketball games by tracking players using Recurrent Neural Networks (RNN) and attention. Singh et al [15] tried to perform action recognition using feature trajectories of images from the first-person view camera.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Temporal Feature Aggregation for Event Detection in Unstructured Sports Videos

Chaudhury¹,

Kimura²,

Vinayavekhin³

et al. 2019

2019 IEEE International Symposium on Multimedia (ISM)

View full text Add to dashboard Cite

Image-based sports analytics enable automatic retrieval of key events in a game to speed up the analytics process for human experts. However, most existing methods focus on structured television broadcast video datasets with a straight and fixed camera having minimum variability in the capturing pose. In this paper, we study the case of event detection in sports videos for unstructured environments with arbitrary camera angles. The transition from structured to unstructured video analysis produces multiple challenges that we address in our paper. Specifically, we identify and solve two major problems: unsupervised identification of players in an unstructured setting and generalization of the trained models to pose variations due to arbitrary shooting angles. For the first problem, we propose a temporal feature aggregation algorithm using person re-identification features to obtain high player retrieval precision by boosting a weak heuristic scoring method. Additionally, we propose a data augmentation technique, based on multi-modal image translation model, to reduce bias in the appearance of training samples. Experimental evaluations show that our proposed method improves precision for player retrieval from 0.78 to 0.86 for obliquely angled videos. Additionally, we obtain an improvement in F1 score for rally detection in table tennis videos from 0.79 in case of global frame-level features to 0.89 using our proposed player-level features. Please see the supplementary video submission at https://ibm.biz/BdzeZA.

show abstract

Detecting Events and Key Actors in Multi-person Videos

Cited by 173 publications

References 66 publications

Learning Actor Relation Graphs for Group Activity Recognition

Learning Actor Relation Graphs for Group Activity Recognition

Convolutional Relational Machine for Group Activity Recognition

Unsupervised Temporal Feature Aggregation for Event Detection in Unstructured Sports Videos

Contact Info

Product

Resources

About