Retrieving actions in movies

Laptev, Ivan; Pérez, Patrick

doi:10.1109/iccv.2007.4409105

Cited by 353 publications

(331 citation statements)

References 27 publications

(52 reference statements)

Supporting

Mentioning

324

Contrasting

Unclassified

Order By: Relevance

“…As described in Section 2, there exist several public datasets that are widely referred and tested in the literature of human activity recognition [8,16,13,5,20]. However, all these datasets (except the Soccer dataset) are taken from an approximate side view and they have human figures presented in high-resolution imagery (Fig.…”

Section: Aerial View Activity Classification Challengementioning

confidence: 99%

“…Recently, more challenging datasets were constructed by collecting realistic videos from movies [13,12,14]. These movie scenes are taken from varying view points with complex backgrounds, in contrast of the previous public datasets [16,8].…”

Section: Previous Datasetsmentioning

confidence: 99%

“…Even though there exist other public datasets composed of human action videos [16,8,20,13] (Fig. 3 (a-e)), most of them focus on recognition of simple actions (e.g.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010

Ryoo

Chen²,

Aggarwal³

et al. 2010

Lecture Notes in Computer Science

206

153

View full text Add to dashboard Cite

Abstract. This paper summarizes results of the 1st Contest on Semantic Description of Human Activities (SDHA), in conjunction with ICPR 2010. SDHA 2010 consists of three types of challenges, High-level Human Interaction Recognition Challenge, Aerial View Activity Classification Challenge, and Wide-Area Activity Search and Recognition Challenge. The challenges are designed to encourage participants to test existing methodologies and develop new approaches for complex human activity recognition scenarios in realistic environments. We introduce three new public datasets through these challenges, and discuss results of state-ofthe-art activity recognition systems designed and implemented by the contestants. A methodology using a spatio-temporal voting [19] successfully classified segmented videos in the UT-Interaction datasets, but had a difficulty correctly localizing activities from continuous videos. Both the method using local features [10] and the HMM based method [18] recognized actions from low-resolution videos (i.e. UT-Tower dataset) successfully. We compare their results in this paper.

show abstract

Section: Aerial View Activity Classification Challengementioning

confidence: 99%

Section: Previous Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010

Ryoo

Chen²,

Aggarwal³

et al. 2010

Lecture Notes in Computer Science

206

153

View full text Add to dashboard Cite

show abstract

“…Some of the most popular descriptors have been based on the gradient of the appearance information (Schuldt et al 2004;Laptev and Perez 2007), spatio-temporal extensions to SIFT and SURF descriptors (Scovanner et al 2007;Willems et al 2008) and 2D motion information (Dalal et al 2006;Messing et al 2009). However, there has been little previous work on feature descriptors including depth information (which is generally encoded directly at the holistic level, with the aid of user masks).…”

Section: Related Workmentioning

confidence: 99%

Hollywood 3D: What are the Best 3D Features for Action Recognition?

2016

View full text Add to dashboard Cite

Action recognition "in the wild" is extremely challenging, particularly when complex 3D actions are projected down to the image plane, losing a great deal of information. The recent growth of 3D data in broadcast content and commercial depth sensors, makes it possible to overcome this. However, there is little work examining the best way to exploit this new modality. In this paper we introduce the Hollywood 3D benchmark, which is the first dataset containing "in the wild" action footage including 3D data. This dataset consists of 650 stereo video clips across 14 action classes, taken from Hollywood movies. We provide stereo calibrations and depth reconstructions for each clip. We also provide an action recognition pipeline, and propose a number of specialised depth-aware techniques including five interest point detectors and three feature descriptors. Extensive tests allow evaluation of different appearance and depth encoding schemes. Our novel techniques exploiting this depth allow us to reach performance levels more than triple those of the best baseline algorithm using only appearance information. The benchmark data, code and calibrations are all made available to the community.

show abstract

“…Once salient parts of the sequence have been detected, various local feature descriptors are generally extracted from these regions. Local features which have proved effective in the past include gradient based appearance information [28,17], 2D motion information [22,5] and spatio-temporal extensions to SIFT and SURF descriptors [29,33]. For "in the wild" action recognition, the use of the Hollywood-3D dataset has prompted the investigation of local features based on 3D information.…”

Section: Related Workmentioning

confidence: 99%

Natural Action Recognition Using Invariant 3D Motion Encoding

Hadfield

Lebeda

Bowden

2014

Computer Vision – ECCV 2014

View full text Add to dashboard Cite

Abstract. We investigate the recognition of actions "in the wild" using 3D motion information. The lack of control over (and knowledge of) the camera configuration, exacerbates this already challenging task, by introducing systematic projective inconsistencies between 3D motion fields, hugely increasing intra-class variance. By introducing a robust, sequence based, stereo calibration technique, we reduce these inconsistencies from fully projective to a simple similarity transform. We then introduce motion encoding techniques which provide the necessary scale invariance, along with additional invariances to changes in camera viewpoint.On the recent Hollywood 3D natural action recognition dataset, we show improvements of 40% over previous state-of-the-art techniques based on implicit motion encoding. We also demonstrate that our robust sequence calibration simplifies the task of recognising actions, leading to recognition rates 2.5 times those for the same technique without calibration. In addition, the sequence calibrations are made available.

show abstract

Retrieving actions in movies

Cited by 353 publications

References 27 publications

An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010

An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010

Hollywood 3D: What are the Best 3D Features for Action Recognition?

Natural Action Recognition Using Invariant 3D Motion Encoding

Contact Info

Product

Resources

About