2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.332
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Events and Key Actors in Multi-person Videos

Abstract: Multi-person event recognition is a challenging task, often with many people active in the scene but only a small subset contributing to an actual event. In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event. Our model does not use explicit annotations regarding who or where those people are during training and testing. In particular, we track people in videos and use a recurrent neural network (RNN) to represent t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
179
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 173 publications
(180 citation statements)
references
References 66 publications
1
179
0
Order By: Relevance
“…The earlier approaches are mostly based on a combination of hand-crafted visual features with probability graphical models [1,31,30,43,6,8,17] or AND-OR grammar models [2,46]. Recently, the wide adoption of deep convolutional neural networks (CNNs) has demonstrated significant performance improvements on group activity recognition [3,24,41,45,12,32,59,23,39]. Ibrahim et al [24] designed a two-stage deep temporal model, which builds a LSTM model to represent action dynamics of individual people and another LSTM model to aggregate personlevel information.…”
Section: Related Workmentioning
confidence: 99%
“…The earlier approaches are mostly based on a combination of hand-crafted visual features with probability graphical models [1,31,30,43,6,8,17] or AND-OR grammar models [2,46]. Recently, the wide adoption of deep convolutional neural networks (CNNs) has demonstrated significant performance improvements on group activity recognition [3,24,41,45,12,32,59,23,39]. Ibrahim et al [24] designed a two-stage deep temporal model, which builds a LSTM model to represent action dynamics of individual people and another LSTM model to aggregate personlevel information.…”
Section: Related Workmentioning
confidence: 99%
“…The idea of using RNNs for group activity recognition started with [13] that uses Long Short-Term Memory (LSTM) networks to model individual persons and pools the representations from them to a specific LSTM for modeling group activity. In [20], attention pooling is utilized to give higher importance to key actors. Person-centered features are introduced in [24] as input to a hierarchical LSTM.…”
Section: Group Activity Recognitionmentioning
confidence: 99%
“…We provide the details of each area below, Using general action recognition: Methods in this category use techniques from action recognition literature [7], [8], [13] to detect important events in sports videos. Ramanathan et al [14] proposed a model that learns to detect events and key actors in basketball games by tracking players using Recurrent Neural Networks (RNN) and attention. Singh et al [15] tried to perform action recognition using feature trajectories of images from the first-person view camera.…”
Section: Related Workmentioning
confidence: 99%