Action recognition using global spatio-temporal features derived from sparse representations

Somasundaram, Guruprasad; Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos

doi:10.1016/j.cviu.2014.01.002

Cited by 45 publications

(26 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Extraction of Salient region: In order to extract salient/important features some methods like commonly used are STIP method [8], sparse coding [7] and STFT. Here no extraction of human area done.…”

Section: Extracting Area Of Importance/targetmentioning

confidence: 99%

Review on Vision based Human Activity Analysis

SankaranNampoothiri¹,

K²

2014

IJCA

View full text Add to dashboard Cite

Recognizing human actions are important in various real time applications. Review on human activity analysis is provided in three sections. The first section in this paper presents an overall classification to Human activity analysis from feature extraction to recognition systems. In the second section a survey is included which provides technical information to activity analysis. Finally a brief description of databases which came across in survey is also included. The overall purpose of this paper is to provide a basic understanding to human activity analysis and to analyze the major challenge in human activity analysis.

show abstract

Section: Extracting Area Of Importance/targetmentioning

confidence: 99%

Review on Vision based Human Activity Analysis

SankaranNampoothiri¹,

K²

2014

IJCA

View full text Add to dashboard Cite

show abstract

“…However, how to fuse or integrate all streams is still an open question. Before deeply learned features became popular, there were many research approaches to video classification using various methods, especially handcrafted methods such as spatiotemporal features [1], dense trajectories [9], and local autocorrelation [19]. Three-dimensional (3D) CNN was the first attempt to train spatiotemporal features for video classification using deep CNNs.…”

Section: Related Workmentioning

confidence: 99%

“…1 Graduate School of Information Engineering, Hiroshima University, Higashi Hiroshima, Japan. 2 Department of Information Engineering, Hiroshima University, Higashi Hiroshima, Japan.…”

Section: Authors' Informationmentioning

confidence: 99%

“…Video classification using CNN has achieved significant improvement since the use of a collection of still images and ImageNet weights to be fine tuned on two stream *Correspondence: cbasemaster@gmail.com 1 Graduate School of Information Engineering, Hiroshima University, Higashi Hiroshima, Japan Full list of author information is available at the end of the article network. In this paper, we implemented the two-stream CNN proposed by Simonyan [11] for human action recognition, which uses spatial and motion streams using the Chainer framework [13].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning

Yudistira

Kurita

2017

J Image Video Proc.

View full text Add to dashboard Cite

Human activity recognition requires both visual and temporal cues, making it challenging to integrate these important modalities. The usual schemes for integration are averaging and fixing the weights of both features for all samples. However, how much weight is needed for each sample and modality, is still an open question. A mixture of experts via a gating Convolutional Neural Network (CNN) is one promising architecture for adaptively weighting every sample within a dataset. In this paper, rather than just averaging or using fixed weights, we investigate how a natural associative cortex such as a network integrates expert networks to form a gating CNN scheme. Starting from Red Green Blue color model (RGB) values and optical flows, we show that with proper treatment, the gating CNN scheme works well, indicating future approaches to information integration in future activity recognition.

show abstract

“…We are using the seven translation, After the two feature sets are computed, the feature vectors are combined and encoded into a single code by using the bag of features algorithm [13]. The unsupervised learning K-means clustering and supervised learning K-nearest neighbor (KNN) are used for classification the different action from videos.…”

Section: Overview Of Our Approachmentioning

confidence: 99%

RGBD Human Action Recognition using Multi-Features Combination and K-Nearest Neighbors Classification

Al-Akam¹,

Paulus²

2017

ijacsa

View full text Add to dashboard Cite

Abstract-In this paper, we present a novel system to analyze human body motions for action recognition task from two sets of features using RGBD videos. The Bag-of-Features approach is used for recognizing human action by extracting local spatialtemporal features and shape invariant features from all video frames. These feature vectors are computed in four steps: Firstly, detecting all interest keypoints from RGB video frames using Speed-Up Robust Features and filters motion points using Motion History Image and Optical Flow, then aligned these motion points to the depth frame sequences. Secondly, using a Histogram of orientation gradient descriptor for computing the features vector around these points from both RGB and depth channels, then combined these feature values in one RGBD feature vector. Thirdly, computing Hu-Moment shape features from RGBD frames; fourthly, combining the HOG features with Hu-moments features in one feature vector for each video action. Finally, the k-means clustering and the multi-class K-Nearest Neighbor is used for the classification task. This system is invariant to scale, rotation, translation, and illumination. All tested, are utilized on a dataset that is available to the public and used often in the community. By using this new feature combination method improves performance on actions with low movement and reach recognition rates superior to other publications of the dataset.

show abstract

Action recognition using global spatio-temporal features derived from sparse representations

Cited by 45 publications

References 55 publications

Review on Vision based Human Activity Analysis

Review on Vision based Human Activity Analysis

Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning

RGBD Human Action Recognition using Multi-Features Combination and K-Nearest Neighbors Classification

Contact Info

Product

Resources

About