2012 19th IEEE International Conference on Image Processing 2012
DOI: 10.1109/icip.2012.6466966
|View full text |Cite
|
Sign up to set email alerts
|

Dominant spatio-temporal modulations and energy tracking in videos: Application to interest point detection for action recognition

Abstract: The presence of multiband amplitude and frequency modulations (AM-FM) in wideband signals, such as textured images or speech, has led to the development of efficient multicomponent modulation models for low-level image and sound analysis. Moreover, compact yet descriptive representations have emerged by tracking, through non-linear energy operators, the dominant model components across time, space or frequency. In this paper, we propose a generalization of such approaches in the 3D spatio-temporal domain and e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0
1

Year Published

2014
2014
2017
2017

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 16 publications
0
5
0
1
Order By: Relevance
“…1 Dominant component analysis on the outputs of Gabor filterbanks has been used for 2D texture analysis and segmentation in [91,90,92] and for spatio-temporal action classification in [93,94]. It may include additional steps of demodulation.…”
Section: Postprocessingmentioning
confidence: 99%
“…1 Dominant component analysis on the outputs of Gabor filterbanks has been used for 2D texture analysis and segmentation in [91,90,92] and for spatio-temporal action classification in [93,94]. It may include additional steps of demodulation.…”
Section: Postprocessingmentioning
confidence: 99%
“…1, the energy outputs of all 400 filters are handled by some operator in order to obtain the final energy map of each video. We used some ideas from Dominant Energy Analysis (DEA), as in [12], where the energy of the most dominant channel is considered as the energy value in each voxel:…”
Section: ] ψ[S(t)] ≡ [S (T)] 2 −S(t)s (T)mentioning
confidence: 99%
“…Video representations in terms of such features exhibit efficiency in distinguishing among action classes, while bypassing the need for precise background subtraction or tracking. Local image and video features have been successfully used for many tasks such as object and scene recognition [23] as well as human action recognition [9,12,24,32]. Local spatio-temporal features are able to capture characteristic shape and motion in video.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Traditionally, research in behavior and affect analysis has focused on recognizing behavioral cues such as smiles, head nods, and laughter (Déniz et al 2008;Kawato and Ohya 2000;Lockerd and Mueller 2002), pre-defined posed human actions (e.g., walking, running, and hand-clapping) (Dollár et al 2005;Niebles et al 2008;Georgakis et al 2012) or discrete, basic emotional states (e.g., happiness, sadness) (Pantic and Rothkrantz 2000;Cohen et al 2003;Littlewort et al 2006) mainly from posed data acquired in laboratory settings. However, these models are deemed unrealistic as they are unable to capture the temporal evolution of non-basic, possibly atypical, behaviors and subtle affective states exhibited by humans in naturalistic settings.…”
Section: Introductionmentioning
confidence: 99%