Local Feature Extraction from RGB and Depth Videos for Human Action Recognition

Al-Akam, Rawya; Paulus, Dietrich

doi:10.18178/ijmlc.2018.8.3.699

Cited by 15 publications

(5 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rawya et al [46] MSR-Daily Activity 3D dataset and Online RGBD action dataset Spatio-temporal features were extracted using a Bag-of-Features (BoF) approach. Points of interest were detected, and motion history images were created to perform this research work.…”

Section: Rgb-d Datasets Methodology Classification Resultsmentioning

confidence: 99%

Modeling Two-Person Segmentation and Locomotion for Stereoscopic Action Identification: A Sustainable Video Surveillance System

et al. 2021

View full text Add to dashboard Cite

Due to the constantly increasing demand for automatic tracking and recognition systems, there is a need for more proficient, intelligent and sustainable human activity tracking. The main purpose of this study is to develop an accurate and sustainable human action tracking system that is capable of error-free identification of human movements irrespective of the environment in which those actions are performed. Therefore, in this paper we propose a stereoscopic Human Action Recognition (HAR) system based on the fusion of RGB (red, green, blue) and depth sensors. These sensors give an extra depth of information which enables the three-dimensional (3D) tracking of each and every movement performed by humans. Human actions are tracked according to four features, namely, (1) geodesic distance; (2) 3D Cartesian-plane features; (3) joints Motion Capture (MOCAP) features and (4) way-points trajectory generation. In order to represent these features in an optimized form, Particle Swarm Optimization (PSO) is applied. After optimization, a neuro-fuzzy classifier is used for classification and recognition. Extensive experimentation is performed on three challenging datasets: A Nanyang Technological University (NTU) RGB+D dataset; a UoL (University of Lincoln) 3D social activity dataset and a Collective Activity Dataset (CAD). Evaluation experiments on the proposed system proved that a fusion of vision sensors along with our unique features is an efficient approach towards developing a robust HAR system, having achieved a mean accuracy of 93.5% with the NTU RGB+D dataset, 92.2% with the UoL dataset and 89.6% with the Collective Activity dataset. The developed system can play a significant role in many computer vision-based applications, such as intelligent homes, offices and hospitals, and surveillance systems.

show abstract

Section: Rgb-d Datasets Methodology Classification Resultsmentioning

confidence: 99%

Modeling Two-Person Segmentation and Locomotion for Stereoscopic Action Identification: A Sustainable Video Surveillance System

et al. 2021

View full text Add to dashboard Cite

show abstract

“…The Expectation step (E-step) computes the posterior probability of each data point belonging to each component the GMM, given the current estimates of the parameters. Mathematically, this is given by: 𝑃(𝑧 𝑖 = 𝑘|𝑥 𝑖 , 𝜃) = 𝜋 𝑘 * 𝑁(𝑥 𝑖 |𝜇 𝑘 ,𝛴 𝑘 ) 𝛴 𝑗 (𝜋 𝑗 * 𝑁(𝑥 𝑖 |𝜇 𝑗 ,𝛴 𝑗 )) (8) where zi is the latent variable indicating the elliptical component assignment for data point xi, θ is the set of all parameters (π, μ, Σ) for the GMM, and N(xi | μk, Σk) is the probability density function of the normal distribution for k th ellipse. The Maximization step (M-step) updates the estimates of the parameters by maximizing the expected complete data log-likelihood, given the posterior probabilities computed in the E-step.…”

Section: ) Gmm-em-based Elliptical Modelingmentioning

confidence: 99%

“…Working with drone cameras is more challenging as compared to grounded cameras because in the former case, the background is not static and there is more chance of noise induction into the data. Many previous works have focused on recognizing actions captured by conventional Red-Green-Blue (RGB) video cameras, e.g., [8], and [9]. However, these works have limitations such as coping with various lighting conditions and cluttered backgrounds because RGB data suffer from these variations.…”

Section: Introductionmentioning

confidence: 99%

An Elliptical Modeling Supported System for Human Action Deep Recognition Over Aerial Surveillance

et al. 2023

View full text Add to dashboard Cite

The advancement of computer vision technology has led to the development of sophisticated algorithms capable of accurately recognizing human actions from red-green-blue videos recorded by drone cameras. Hence, possessing an exceptional potential, human action recognition also faces many challenges including, tendency of humans to perform the same action in different ways, limited camera angles, and field of view. In this research article, a system has been proposed to tackle the forementioned challenges by using red-green-blue videos as input while the videos were recorded by drone cameras. First of all, the video was split into its constituent frames and then gamma correction was applied on each frame to obtain an optimized version of the image. Then the Felzenszwalb's algorithm performed the segmentation to segment out human from the input image and human silhouette was generated. Utilizing the silhouette, skeleton was extracted to spot thirteen body key points. The key points were then used to perform elliptical modeling to estimate the individual boundaries of the body parts while the elliptical modeling was governed by the Gaussian mixture model-expectation maximization algorithm. The elliptical models of the body parts were utilized to spot fiducial points that if tracked, could provide very useful information about the performed action. Some other features that were extracted for this study include, the 3d point cloud feature vector, relative distance and velocity of the key-points, and their mutual angles. The features were then forwarded for optimization under a quadratic discriminant analysis and finally, a convolutional neural network was trained to perform the action classification. Three benchmark datasets including, the Drone-Action dataset, the UAV-Human dataset, and the Okutama-Action dataset were used for a comprehensive experimentation. The system outperformed the state-of-the-art approaches by securing accuracies of 80.03%, 48.60%, and 78.01% over the Drone-Action dataset, the UAV-Human dataset, and the Okutama-Action dataset respectively.

show abstract

“…Rawya Al-Akam and Dietrich Paulus investigated a new method for detecting human activities in 3D movies using RGB and depth data. The method suggested in [13] involves using Bag-of-Information techniques to extract local-spatial temporal features from all frames of a video and to distinguish between human activities. To achieve this, K-means clustering and multi-class Support Vector Machines are utilized for classification, and the system is designed to be invariant to scale, rotation, and lighting.…”

Section: Page 201mentioning

confidence: 99%

Review of Literature on Human Activity Detection and Recognition

Naik,

Srinivasa Rao Kunte

2023

IJMTS

View full text Add to dashboard Cite

Purpose: The objective of this research article is to methodically combine the existing literature on Human Activity Recognition (HAR) and provide an understanding of the present state of the HAR literature. Additionally, the article aims to suggest an appropriate HAR system that can be used for detecting real-time activities such as suspicious behavior, surveillance, and healthcare. Objective: This review study intends to delve into the current state of human activity detection and recognition methods, while also pointing towards promising avenues for further research and development in the field, particularly with regards to complex and multi-task human activity recognition across different domains. Design/Methodology/Approach: A systematic literature review methodology was adopted by collecting and analyzing the required literature available from international and national journals, conferences, databases and other resources searched through the Google Scholar and other search engines. Findings/Result: The systematic review of literature uncovered the various approaches of Human activity detection and recognition. Even though the prevailing literature reports the investigations of several aspects of Human activity detection and recognition, there is still room for exploring the role of this technology in various domains to enhance its robustness in detecting and recognizing of multiple human actions from preloaded CCTV cameras, which can aid in detecting abnormal and suspicious activities and ultimately reduce aberrant human actions in society. Originality/Value: This paper follows a systematic approach to examine the factors that impact the detection and recognition of Human activity and suggests a concept map. The study undertaken supplements the expanding literature on knowledge sharing highlighting its significance. Paper Type: Review Paper.

show abstract

Local Feature Extraction from RGB and Depth Videos for Human Action Recognition

Cited by 15 publications

References 22 publications

Modeling Two-Person Segmentation and Locomotion for Stereoscopic Action Identification: A Sustainable Video Surveillance System

Modeling Two-Person Segmentation and Locomotion for Stereoscopic Action Identification: A Sustainable Video Surveillance System

An Elliptical Modeling Supported System for Human Action Deep Recognition Over Aerial Surveillance

Review of Literature on Human Activity Detection and Recognition

Contact Info

Product

Resources

About