Spatio-Temporal Frames in a Bag-of-Visual-Features Approach for Human Actions Recognition

Lopes, Amauri; Oliveira, Rodrigo Silva de; Almeida, Jussara M.; Araújo, Arnaldo de Albuquerque

doi:10.1109/sibgrapi.2009.17

Cited by 9 publications

(5 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of space-time features is presented in the work [7]. The Bag of Visual Features (BoVF) method is used, which consists of the following steps: characteristic points detection and description (SIFT algorithm), creating a dictionary of the extracted features (using clustering), assigning each detected point to a word from the dictionary (using the smallest distance criterion) and creating a histogram of used "visual words".…”

Section: Human Action Recognition Approaches Reviewmentioning

confidence: 99%

Human action recognition using simple geometric features and a finite state machine

Dudzńiski

Kryjak

Mikrut

2013

Image Processing & Communications

View full text Add to dashboard Cite

show abstract

Section: Human Action Recognition Approaches Reviewmentioning

confidence: 99%

Human action recognition using simple geometric features and a finite state machine

Dudzńiski

Kryjak

Mikrut

2013

Image Processing & Communications

View full text Add to dashboard Cite

show abstract

“…A pixel Gi(t) at level Gi maps to a pixel G0(2 i t). For example, G1(3) mapped to G0 (6) and G2(1) mapped to G0(4).…”

Section: Spatio-temporal Difference Of Gaussian Pyramidmentioning

confidence: 99%

“…The assumption here is that spatio-temporal events can be described by common interest points between the spatial axis (appearance information) and the temporal axis (motion information). Lopes et al presented an approach to forming a spatio-temporal volume by stacking a set of frames from a video signal [6]. There are three directions to slice this volume into planes, as illustrated in Figure 3.…”

Section: Interest Points Detectionmentioning

confidence: 99%

See 1 more Smart Citation

Spatio-temporal SIFT and Its Application to Human Action Classification

Ghamdi

Zhang

Gotoh

2012

Computer Vision – ECCV 2012. Workshops and Demonstrations

View full text Add to dashboard Cite

Abstract. This paper presents a space-time extension of scale-invariant feature transform (SIFT) originally applied to the 2-dimensional (2D) volumetric images. Most of the previous extensions dealt with 3-dimensional (3D) spacial information using a combination of a 2D detector and a 3D descriptor for applications such as medical image analysis. In this work we build a spatio-temporal difference-of-Gaussian (DoG) pyramid to detect the local extrema, aiming at processing video streams. Interest points are extracted not only from the spatial plane (xy) but also from the planes along the time axis (xt and yt). The space-time extension was evaluated using the human action classification task. Experiments with the KTH and the UCF sports datasets show that the approach was able to produce results comparable to the state-of-the-arts.

show abstract

“…However, there are still some unsolved issues such as background clutter, viewpoint variation, illumination change and class variability [2]. Recently, significant progress has been demonstrated with spatio-temporal feature representation along with variations of the most popular and widely used bag of visual words approaches (BoVW) [3], which have the ability to handle viewpoint independence, occlusion and scale invariance [4,5]. Therefore, there has been a growing interest in exploring the potential of possible variants of the classical BoVW approach, which characterizes actions using a histogram of feature occurrence after clustering [6].…”

Section: Introductionmentioning

confidence: 99%

Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action Recognition

Nazir

Yousaf

Nebel

et al. 2019

Sensors

View full text Add to dashboard Cite

Human action recognition (HAR) has emerged as a core research domain for video understanding and analysis, thus attracting many researchers. Although significant results have been achieved in simple scenarios, HAR is still a challenging task due to issues associated with view independence, occlusion and inter-class variation observed in realistic scenarios. In previous research efforts, the classical bag of visual words approach along with its variations has been widely used. In this paper, we propose a Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition without compromising the strengths of the classical bag of visual words approach. Expressions are formed based on the density of a spatio-temporal cube of a visual word. To handle inter-class variation, we use class-specific visual word representation for visual expression generation. In contrast to the Bag of Expressions (BoE) model, the formation of visual expressions is based on the density of spatio-temporal cubes built around each visual word, as constructing neighborhoods with a fixed number of neighbors could include non-relevant information making a visual expression less discriminative in scenarios with occlusion and changing viewpoints. Thus, the proposed approach makes the model more robust to occlusion and changing viewpoint challenges present in realistic scenarios. Furthermore, we train a multi-class Support Vector Machine (SVM) for classifying bag of expressions into action classes. Comprehensive experiments on four publicly available datasets: KTH, UCF Sports, UCF11 and UCF50 show that the proposed model outperforms existing state-of-the-art human action recognition methods in term of accuracy to 99.21%, 98.60%, 96.94 and 94.10%, respectively.

show abstract

Spatio-Temporal Frames in a Bag-of-Visual-Features Approach for Human Actions Recognition

Cited by 9 publications

References 27 publications

Human action recognition using simple geometric features and a finite state machine

Human action recognition using simple geometric features and a finite state machine

Spatio-temporal SIFT and Its Application to Human Action Classification

Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action Recognition

Contact Info

Product

Resources

About