2011
DOI: 10.1007/s12559-011-9097-0
|View full text |Cite
|
Sign up to set email alerts
|

Spatiotemporal Features for Action Recognition and Salient Event Detection

Abstract: Although the mechanisms of human visual understanding remain partially unclear, computational models inspired by existing knowledge on human vision have emerged and applied to several fields. In this paper, we propose a novel method to compute visual saliency from video sequences by counting in the actual spatiotemporal nature of the video. The visual input is represented by a volume in space-time and decomposed into a set of feature volumes in multiple resolutions. Feature competition is used to produce a sal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0
1

Year Published

2011
2011
2019
2019

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 52 publications
(59 reference statements)
0
10
0
1
Order By: Relevance
“…Second, we demonstrated that our method detects wide spectra of events as something significantly different from 'normal' states and also identifies their time intervals (e.g., N h and M h in the hierarchy of Figure 4) which cannot be obtained in the reference method [10]. Moreover, the proposed algorithm require extremely small datasets (e.g., on the order of 10 to 50 frames) in the training phase, whereas it is, in general, necessary to prepare huge amount of video data for learning action category (e.g., normal states) in conventional methods [1,7,10,13,15,20]. For future work, we will extend the proposed framework to multi-category event detection with explicit categorization.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, we demonstrated that our method detects wide spectra of events as something significantly different from 'normal' states and also identifies their time intervals (e.g., N h and M h in the hierarchy of Figure 4) which cannot be obtained in the reference method [10]. Moreover, the proposed algorithm require extremely small datasets (e.g., on the order of 10 to 50 frames) in the training phase, whereas it is, in general, necessary to prepare huge amount of video data for learning action category (e.g., normal states) in conventional methods [1,7,10,13,15,20]. For future work, we will extend the proposed framework to multi-category event detection with explicit categorization.…”
Section: Discussionmentioning
confidence: 99%
“…Most action (event) detection methods extract features such as optical flow based features [12,20], spatio-temporal features [5,[10][11][12][13][14][15]27], or static features including appearance, shape, and spatial relations of local features [8,18]. Some unsupervised approaches, after extraction of those primitive features, utilize codebook representation [2,13,19] effective for describing and discriminating various event categories.…”
Section: Introductionmentioning
confidence: 99%
“…To demonstrate generalisability, this method has been systematically tested on a variety of datasets and shown to be more effective and accurate for action recognition. Rapantzikos et al (2011) developed a method to compute visual saliency from video sequence by counting the actual spatio temporal nature of video. The visual input is represented by a volume in space-time.…”
Section: Related Workmentioning
confidence: 99%
“…across time or perceptual scene. During the last years, many computational frameworks have been proposed for attention and saliency modeling, since they play a significant role in various multimedia applications, such as action recognition [7][8][9], behavioral analysis, and movie summarization [10][11][12].…”
Section: Introductionmentioning
confidence: 99%