3D CNN for Human Action Recognition

Boualia, Sameh Neili; Amara, Najoua Essoukri Ben

doi:10.1109/ssd52085.2021.9429429

Cited by 7 publications

(7 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, 3D CNNs can learn spatiotemporal features for continuous frame data. However, 3D CNNs have the disadvantage of requiring a large amount of computation for training due to large frame datasets and 3D convolutional filters [15].…”

Section: Related Workmentioning

confidence: 99%

“…It computes the center x of x 1 and x 2 and the center y of y 1 and y 2 , returning the coordinates x, y. Euclidean_Distance calculates the Euclidean distance between the centroid coordinates of detected individuals (lines 9-11). Grouping clusters adjacent centroid coordinates (lines [12][13][14][15][16][17][18][19]. Depending on the number of individuals detected in the current frame, the Euclidean distance of the centroid coordinates is calculated.…”

Section: Object Grouping Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

GLBRF: Group-Based Lightweight Human Behavior Recognition Framework in Video Camera

Lee,

Kim

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Behavioral recognition is an important technique for recognizing actions by analyzing human behavior. It is used in various fields, such as anomaly detection and health estimation. For this purpose, deep learning models are used to recognize and classify the features and patterns of each behavior. However, video-based behavior recognition models require a lot of computational power as they are trained using large datasets. Therefore, there is a need for a lightweight learning framework that can efficiently recognize various behaviors. In this paper, we propose a group-based lightweight human behavior recognition framework (GLBRF) that achieves both low computational burden and high accuracy in video-based behavior recognition. The GLBRF system utilizes a relatively small dataset to reduce computational cost using a 2D CNN model and improves behavior recognition accuracy by applying location-based grouping to recognize interaction behaviors between people. This enables efficient recognition of multiple behaviors in various services. With grouping, the accuracy was as high as 98%, while without grouping, the accuracy was relatively low at 68%.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Object Grouping Algorithmmentioning

confidence: 99%

GLBRF: Group-Based Lightweight Human Behavior Recognition Framework in Video Camera

Lee,

Kim

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Others have gone one step further by proposing 3D architectures that include a variety of hidden layers, consisting of 3D convolutional, dropout, 3D pooling and fully connected layers for spatio-temporal consideration of third-person perspective video data (Almaadeed et al 2019;Basha, Pulabaigari, et al, 2020;Boualia and Amara, 2021;Wan et al 2020). These state-of-the-art models have reached classification performances of over 90% on well-known datasets such as UFC101 (Soomro et al 2012), HMDB51 (Kuehne et al 2011) and KTH (Schüldt et al 2004).…”

Section: Action Recognitionmentioning

confidence: 99%

“…The majority of these video-based HAR methods use a combination of convolutional neural networks (CNNs) and recurrent neural networks (RNN) to extract image-based features, as well as to leverage temporal relationships between video frames (Basha et al 2020a, b;Dai et al 2020;Garcia-Hernando et al 2018;Kapidis et al 2021;Ng et al 2015). When trained with large quantities of training data from publicly available action recognition datasets, neural network-based models have been able to reach recognition accuracies of over 90% for everyday actions (Boualia and Amara, 2021;Wan et al 2020).…”

Section: Introductionmentioning

confidence: 99%

What we see is what we do: a practical Peripheral Vision-Based HMM framework for gaze-enhanced recognition of actions in a medical procedural task

Wang

Kreiner

Lutz

et al. 2023

User Model User-Adap Inter

View full text Add to dashboard Cite

Deep learning models have shown remarkable performances in egocentric video-based action recognition (EAR), but rely heavily on a large quantity of training data. In specific applications with only limited data available, eye movement data may provide additional valuable sensory information to achieve accurate classification performances. However, little is known about the effectiveness of gaze data as a modality for egocentric action recognition. We, therefore, propose the new Peripheral Vision-Based HMM (PVHMM) classification framework, which utilizes context-rich and object-related gaze features for the detection of human action sequences. Gaze information is quantified using two features, the object-of-interest hit and the object–gaze distance, and human action recognition is achieved by employing a hidden Markov model. The classification performance of the framework is tested and validated on a safety-critical medical device handling task sequence involving seven distinct action classes, using 43 mobile eye tracking recordings. The robustness of the approach is evaluated using the addition of Gaussian noise. Finally, the results are then compared to the performance of a VGG-16 model. The gaze-enhanced PVHMM achieves high classification performances in the investigated medical procedure task, surpassing the purely image-based classification model. Consequently, this gaze-enhanced EAR approach shows the potential for the implementation in action sequence-dependent real-world applications, such as surgical training, performance assessment, or medical procedural tasks.

show abstract

“…Boualia and Amara [29] submitted video image frames to a 3D CNN to describe the action involved without preprocessing or feature extraction. Mishra et al [30] identified regions of interest in the input video image frames, then used MHI and motion energy images to extract features.…”

Section: State Of the Artmentioning

confidence: 99%

Deep Learning Approach for Human Action Recognition Using a Time Saliency Map Based on Motion Features Considering Camera Movement and Shot in Video Image Sequences

Alavigharahbagh,

Hajihashemi,

Machado

et al. 2023

Information

View full text Add to dashboard Cite

In this article, a hierarchical method for action recognition based on temporal and spatial features is proposed. In current HAR methods, camera movement, sensor movement, sudden scene changes, and scene movement can increase motion feature errors and decrease accuracy. Another important aspect to take into account in a HAR method is the required computational cost. The proposed method provides a preprocessing step to address these challenges. As a preprocessing step, the method uses optical flow to detect camera movements and shots in input video image sequences. In the temporal processing block, the optical flow technique is combined with the absolute value of frame differences to obtain a time saliency map. The detection of shots, cancellation of camera movement, and the building of a time saliency map minimise movement detection errors. The time saliency map is then passed to the spatial processing block to segment the moving persons and/or objects in the scene. Because the search region for spatial processing is limited based on the temporal processing results, the computations in the spatial domain are drastically reduced. In the spatial processing block, the scene foreground is extracted in three steps: silhouette extraction, active contour segmentation, and colour segmentation. Key points are selected at the borders of the segmented foreground. The last used features are the intensity and angle of the optical flow of detected key points. Using key point features for action detection reduces the computational cost of the classification step and the required training time. Finally, the features are submitted to a Recurrent Neural Network (RNN) to recognise the involved action. The proposed method was tested using four well-known action datasets: KTH, Weizmann, HMDB51, and UCF101 datasets and its efficiency was evaluated. Since the proposed approach segments salient objects based on motion, edges, and colour features, it can be added as a preprocessing step to most current HAR systems to improve performance.

show abstract

3D CNN for Human Action Recognition

Cited by 7 publications

References 30 publications

GLBRF: Group-Based Lightweight Human Behavior Recognition Framework in Video Camera

GLBRF: Group-Based Lightweight Human Behavior Recognition Framework in Video Camera

What we see is what we do: a practical Peripheral Vision-Based HMM framework for gaze-enhanced recognition of actions in a medical procedural task

Deep Learning Approach for Human Action Recognition Using a Time Saliency Map Based on Motion Features Considering Camera Movement and Shot in Video Image Sequences

Contact Info

Product

Resources

About