Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

Ehsanpour, Mahsa; Abedin, Alireza; Saleh, Fatemeh Sadat; Shi, Javen Qinfeng; Reid, Ian; Rezatofighi, Hamid

doi:10.1007/978-3-030-58545-7_11

Cited by 55 publications

(29 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Inspired by a transformer network [80] which relies on self-attention mechanisms to allow the network to adaptively extract the most relevant information and relationships, Gavrilyuk et al [21] proposed an actor-transformers network which learns interactions between the actors and adaptively extracts the important information for activity recognition.…”

Section: Deep Relationship Modelingmentioning

confidence: 99%

A Comprehensive Review of Group Activity Recognition in Videos

Wang

Jian

et al. 2021

Int. J. Autom. Comput.

View full text Add to dashboard Cite

Human group activity recognition (GAR) has attracted significant attention from computer vision researchers due to its wide practical applications in security surveillance, social role understanding and sports video analysis. In this paper, we give a comprehensive overview of the advances in group activity recognition in videos during the past 20 years. First, we provide a summary and comparison of 11 GAR video datasets in this field. Second, we survey the group activity recognition methods, including those based on handcrafted features and those based on deep learning networks. For better understanding of the pros and cons of these methods, we compare various models from the past to the present. Finally, we outline several challenging issues and possible directions for future research. From this comprehensive literature review, readers can obtain an overview of progress in group activity recognition for future studies.

show abstract

Section: Deep Relationship Modelingmentioning

confidence: 99%

A Comprehensive Review of Group Activity Recognition in Videos

Wang

Jian

et al. 2021

Int. J. Autom. Comput.

View full text Add to dashboard Cite

show abstract

“…Several works tackle this problem from a graphbased perspective [40,63,100,101] such as applying Graph Convolutional Networks (GCNs) [49,96]. More recent works utilize attention modeling [63,73,98,103] including using Transformers [26,57] with a focus on determining the most critical persons [26,72,96,103], groups [24,57], or interactions [101]. Existing works have primarily use RGB-and optical-flow-based features with RoIAlign [33] to represent individuals [6,73,96,100].…”

Section: Group Activity Recognitionmentioning

confidence: 99%

“…COMPOSER facilitates compositional learning and a high-level semantic understanding of the video with a Multiscale Transformer that performs relational reasoning over these tokens scale by scale. and crowd behavior analysis [24,26,72,104]. Compared to the single-person atomic action recognition task, GAR requires addressing two additional challenges.…”

Section: Introductionmentioning

confidence: 99%

“…Existing work has proposed to jointly learn the group activity with individual actions [3,6,39,41,73,79] or person sub-groups [24,57,67] for a compositional understanding of the group activity. Meanwhile, graph [36,40,96,104] and transformer [26,57] based models have been proposed for relational reasoning over scene entities.…”

Section: Introductionmentioning

confidence: 99%

“…However, these methods fail to elaborately design compositional learning for GAR. They do not sufficiently make use of the multiscale scene elements in the GAR task by modeling over entities at either one semantic scale (e.g., person [26,36,96,104]) or two scales (e.g., person and subgroup [24,57,67]), or keypoint and person [71]). In addition, explicit multiscale modeling is neglected, lacking consistent compositional representations for the group action tasks.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality

Zhou¹,

Kadav²,

Shamsian³

et al. 2021

Preprint

View full text Add to dashboard Cite

Group Activity Recognition (GAR) detects the activity performed by a group of actors in a short video clip. The task requires the compositional understanding of scene entities and relational reasoning between them. We approach GAR by modeling the video as a series of tokens that represent the multi-scale semantic concepts in the video. We propose COMPOSER, a Multiscale Transformer based architecture that performs attention-based reasoning over tokens at each scale and learns group activity compositionally. In addition, we only use the keypoint modality which reduces scene biases and improves the generalization ability of the model. We improve the multi-scale representations in COMPOSER by clustering the intermediate scale representations, while maintaining consistent cluster assignments between scales. Finally, we use techniques such as auxiliary prediction and novel data augmentations (e.g., Actor Dropout) to aid model training. We demonstrate the model's strength and interpretability on the challenging Volleyball dataset. COMPOSER achieves a new state-of-the-art 94.5% accuracy with the keypoint-only modality. COMPOSER outperforms the latest GAR methods that rely on RGB signals, and performs favorably compared against methods that exploit multiple modalities. Our code will be available.

show abstract

Hierarchical Long-Short Transformer for Group Activity Recognition

Zhou

Kong

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

Cited by 55 publications

References 62 publications

A Comprehensive Review of Group Activity Recognition in Videos

A Comprehensive Review of Group Activity Recognition in Videos

COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality

Hierarchical Long-Short Transformer for Group Activity Recognition

Contact Info

Product

Resources

About