2007 IEEE 11th International Conference on Computer Vision 2007
DOI: 10.1109/iccv.2007.4409105
|View full text |Cite
|
Sign up to set email alerts
|

Retrieving actions in movies

Abstract: We address recognition and localization of human actions in realistic scenarios. In contrast to the previous work studying human actions in controlled settings, here we train and test algorithms on real movies with substantial variation of actions in terms of subject appearance, motion, surrounding scenes, viewing angles and spatio-temporal extents. We introduce a new annotated human action dataset and use it to evaluate several existing methods. We in particular focus on boosted space-time window classifiers … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
324
0
3

Year Published

2010
2010
2022
2022

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 353 publications
(331 citation statements)
references
References 27 publications
(52 reference statements)
4
324
0
3
Order By: Relevance
“…As described in Section 2, there exist several public datasets that are widely referred and tested in the literature of human activity recognition [8,16,13,5,20]. However, all these datasets (except the Soccer dataset) are taken from an approximate side view and they have human figures presented in high-resolution imagery (Fig.…”
Section: Aerial View Activity Classification Challengementioning
confidence: 99%
See 2 more Smart Citations
“…As described in Section 2, there exist several public datasets that are widely referred and tested in the literature of human activity recognition [8,16,13,5,20]. However, all these datasets (except the Soccer dataset) are taken from an approximate side view and they have human figures presented in high-resolution imagery (Fig.…”
Section: Aerial View Activity Classification Challengementioning
confidence: 99%
“…Recently, more challenging datasets were constructed by collecting realistic videos from movies [13,12,14]. These movie scenes are taken from varying view points with complex backgrounds, in contrast of the previous public datasets [16,8].…”
Section: Previous Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…Some of the most popular descriptors have been based on the gradient of the appearance information (Schuldt et al 2004;Laptev and Perez 2007), spatio-temporal extensions to SIFT and SURF descriptors (Scovanner et al 2007;Willems et al 2008) and 2D motion information (Dalal et al 2006;Messing et al 2009). However, there has been little previous work on feature descriptors including depth information (which is generally encoded directly at the holistic level, with the aid of user masks).…”
Section: Related Workmentioning
confidence: 99%
“…Once salient parts of the sequence have been detected, various local feature descriptors are generally extracted from these regions. Local features which have proved effective in the past include gradient based appearance information [28,17], 2D motion information [22,5] and spatio-temporal extensions to SIFT and SURF descriptors [29,33]. For "in the wild" action recognition, the use of the Hollywood-3D dataset has prompted the investigation of local features based on 3D information.…”
Section: Related Workmentioning
confidence: 99%