Procedings of the British Machine Vision Conference 2012 2012
DOI: 10.5244/c.26.123
|View full text |Cite
|
Sign up to set email alerts
|

Learning discriminative space-time actions from weakly labelled videos

Abstract: Current state-of-the-art action classification methods extract feature representations from the entire video clip in which the action unfolds, however this representation may include irrelevant scene context and movements which are shared amongst multiple action classes. For example, a waving action may be performed whilst walking, however if the walking movement and scene context appear in other action classes, then they should not be included in a waving movement classifier. In this work, we propose an actio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
32
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(32 citation statements)
references
References 32 publications
(62 reference statements)
0
32
0
Order By: Relevance
“…A video is represented as a bag of kinematic modes. Closely related to our work is the work of Sapienza et al [16] where discriminative action subvolumes are learned in a weakly supervised setting. The learned models are used to classify and localize actions.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A video is represented as a bag of kinematic modes. Closely related to our work is the work of Sapienza et al [16] where discriminative action subvolumes are learned in a weakly supervised setting. The learned models are used to classify and localize actions.…”
Section: Related Workmentioning
confidence: 99%
“…Instead of using subvolumes representation, we use trajectories and extract trajectory groups [14] from the video and aim to learn the discriminative trajectory groups to represent the video. Most importantly, our representation maintains the structural spatio-temporal information in each bag, while [16] treated each instances in a bag independently.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, discriminative space-time video patches have also been exploited for action recognition [37]. However, action recognition approaches are Person re-identification challenges in public space scenes [42].…”
Section: Introductionmentioning
confidence: 99%
“…Laptev et alattempt to mitigate this by splitting the spatio-temporal volume into sub-blocks, creating a descriptor for each sub-block, and concatenating them to create the sequence descriptor (Laptev et al 2008). Sapienza et alfollow a similar vein, encoding individual sub-sequences, however rather than concatenating to create a single descriptor, they employ Multiple Instance Learning (MIL) (Sapienza et al 2012). This accounts for some parts of the sequence being irrelevant, for example before and after the action.…”
Section: Related Workmentioning
confidence: 99%