2018
DOI: 10.1016/j.patrec.2017.12.024
|View full text |Cite
|
Sign up to set email alerts
|

A Bag of Expression framework for improved human action recognition

Abstract: The Bag of Words (BoW) approach has been widely used for human action recognition in recent state-of-the-art methods. In this paper, we introduce what we call a Bag of Expression (BoE) framework, based on the bag of words method, for recognizing human action in simple and realistic scenarios. The proposed approach includes space time neighborhood information in addition to visual words. The main focus is to enhance the existing strengths of the BoW approach like view independence, scale invariance and occlusio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
28
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2

Relationship

4
4

Authors

Journals

citations
Cited by 49 publications
(31 citation statements)
references
References 28 publications
0
28
0
Order By: Relevance
“…The C3D features are computed from RGB IAVID-1 videos and produce 48.77% and 40% prediction accuracy using SVM and CNN. The performance of C3D features is comparable to 2D CNN without considering temporal information for HAR at frame level [18] Similarly, Bag of Expression [32] for HAR produces 26.67% recognition rate using handcrafted 3D-Harris and 3D-SIFT. Some recent silhouettes based HAR techniques are using MHIs described through HOG [20] and LBP-HOG [31] descriptor to recognize human activities through nearest neighbor and SVM classifiers.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 92%
See 2 more Smart Citations
“…The C3D features are computed from RGB IAVID-1 videos and produce 48.77% and 40% prediction accuracy using SVM and CNN. The performance of C3D features is comparable to 2D CNN without considering temporal information for HAR at frame level [18] Similarly, Bag of Expression [32] for HAR produces 26.67% recognition rate using handcrafted 3D-Harris and 3D-SIFT. Some recent silhouettes based HAR techniques are using MHIs described through HOG [20] and LBP-HOG [31] descriptor to recognize human activities through nearest neighbor and SVM classifiers.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 92%
“…The proposed technique outperforms other techniques based on silhouettes in terms of precise recognition. To this end, we analyzed and compared our technique with methods [17,20,31,32] on the IAVID-I dataset. The C3D features are computed from RGB IAVID-1 videos and produce 48.77% and 40% prediction accuracy using SVM and CNN.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Two different techniques have been proposed for dictionary building in [25]: modular dictionary and single dictionary. In [26,27] Nazir et al proposed the dynamic spatio-temporal bag of expressions (D-STBoE) model and the BoE framework for action recognition which improves the existing strength of bag of words. A global feature ensemble representation is discussed by Chen et al [18] who combined the HOG vehicle features extracted in a grid-based pattern.…”
Section: Introductionmentioning
confidence: 99%
“…However, since BoVW approaches only consider the frequency of each visual word and ignore the spatio-temporal relation among visual words, such representation is not able to exploit the contextual relationship between visual words. To overcome this drawback, we propose an extension of our previously proposed Bag of Expressions (BoE) model [7], which incorporates contextual relationships of visual words while preserving inherent qualities of the classical BoVW approach. The main idea of visual expression formation is to store the spatio-temporal contextual information that is usually lost during formation of visual words by encoding neighboring spatio-temporal interest points’ information.…”
Section: Introductionmentioning
confidence: 99%