HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

Amer, Mohamed R.; Peng, Lei; Todorovic, Sinisa

doi:10.1007/978-3-319-10599-4_37

Cited by 100 publications

(73 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It should be noted that the best performing method, i.e. [1], uses its own person detections which biases the comparison with the other methods.…”

Section: Comparison With State Of the Art Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Recognition of Group Activities in Videos Based on Single-and Two-Person Descriptors

Lathuiliere,

Evangelidis,

Horaud

2017

2017 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

Group activity recognition from videos is a very challenging problem that has barely been addressed. We propose an activity recognition method using group context. In order to encode both single-person description and two-person interactions, we learn mappings from highdimensional feature spaces to low-dimensional dictionaries. In particular the proposed two-person descriptor takes into account geometric characteristics of the relative pose and motion between the two persons. Both single-person and two-person representations are then used to define unary and pairwise potentials of an energy function, whose optimization leads to the structured labeling of persons involved in the same activity. An interesting feature of the proposed method is that, unlike the vast majority of existing methods, it is able to recognize multiple distinct group activities occurring simultaneously in a video. The proposed method is evaluated with datasets widely used for group activity recognition, and is compared with several baseline methods.

show abstract

“…It should be noted that the best performing method, i.e. [1], uses its own person detections which biases the comparison with the other methods.…”

Section: Comparison With State Of the Art Methodsmentioning

confidence: 99%

“…6 plots the confusion matrices obtained with three methods and with our method. Note that it is not possible to show the confusion matrix obtained with [1] because only average results are provided in this paper. On dataset A, we compare our method with [35] which uses extra annotations.…”

Section: Comparison With State Of the Art Methodsmentioning

confidence: 99%

Recognition of Group Activities in Videos Based on Single-and Two-Person Descriptors

Lathuiliere,

Evangelidis,

Horaud

2017

2017 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

show abstract

“…To understand the scene of multiple persons, the model needs to not only describe the individual action of each actor in the context, but also infer their collective activity. The ability to accurately capture relevant relation between actors and perform relational reasoning is crucial for understanding group activity of multiple people [30,1,7,23,39,12,24,59]. However, modeling the relation between actors is challenging, as we only have access to individual action labels and collective activity labels, without knowledge of the underlying interaction information.…”

Section: Introductionmentioning

confidence: 99%

Learning Actor Relation Graphs for Group Activity Recognition

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

205

188

View full text Add to dashboard Cite

Modeling relation between actors is important for recognizing group activity in a multi-person scene. This paper aims at learning discriminative relation between actors efficiently using deep models. To this end, we propose to build a flexible and efficient Actor Relation Graph (ARG) to simultaneously capture the appearance and position relation between actors. Thanks to the Graph Convolutional Network, the connections in ARG could be automatically learned from group activity videos in an end-toend manner, and the inference on ARG could be efficiently performed with standard matrix operations. Furthermore, in practice, we come up with two variants to sparsify ARG for more effective modeling in videos: spatially localized ARG and temporal randomized ARG. We perform extensive experiments on two standard group activity recognition datasets: the Volleyball dataset and the Collective Activity dataset, where state-of-the-art performance is achieved on both datasets. We also visualize the learned actor graphs and relation features, which demonstrate that the proposed ARG is able to capture the discriminative relation information for group activity recognition. 1

show abstract

“…The structure of the hidden layer is implicitly inferred during learning. In [32], collective activities involving groups of people are recognized using a Hierarchical Random Field (HiRF). The higher order temporal structures in the videos are captured by using hierarchical dependencies between the variables and learning is specified in a max-margin framework.…”

Section: Related Researchmentioning

confidence: 99%

Activity recognition using a supervised non-parametric hierarchical HMM

Raman

Maybank

2016

Neurocomputing

View full text Add to dashboard Cite

The problem of classifying human activities occurring in depth image sequences is addressed. The 3D joint positions of a human skeleton and the local depth image pattern around these joint positions define the features. A two level hierarchical Hidden Markov Model (H-HMM), with independent Markov chains for the joint positions and depth image pattern, is used to model the features. The states corresponding to the H-HMM bottom level characterize the granular poses while the top level characterizes the coarser actions associated with the activities. Further, the H-HMM is based on a Hierarchical Dirichlet Process (HDP), and is fully non-parametric with the number of pose and action states inferred automatically from data. This is a significant advantage over classical HMM and its extensions. In order to perform classification, the relationships between the actions and the activity labels are captured using multinomial logistic regression. The proposed inference procedure ensures alignment of actions from activities with similar labels. Our construction enables information sharing, allows incorporation of unlabelled examples and provides a flexible factorized representation to include multiple data channels. Experiments with multiple real world datasets show the efficacy of our classification approach.

show abstract

HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

Cited by 100 publications

References 22 publications

Recognition of Group Activities in Videos Based on Single-and Two-Person Descriptors

Recognition of Group Activities in Videos Based on Single-and Two-Person Descriptors

Learning Actor Relation Graphs for Group Activity Recognition

Activity recognition using a supervised non-parametric hierarchical HMM

Contact Info

Product

Resources

About