Recognizing Human Actions Using 3D Skeletal Information and CNNs

Papadakis, Antonios E.; Mathe, Eirini; Vernikos, Ioannis; Maniatis, Apostolos; Spyrou, Evaggelos; Mylonas, Phivos

doi:10.1007/978-3-030-20257-6_44

Cited by 14 publications

(6 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It should be intuitive that (a) different activities require different amounts of time and (b) the same activity requires different amounts of time both in the cases it is performed by different subjects and even when it is performed by the same subject. As in [ 20 ], we used a linear interpolation step, setting the number of frames F as equal for all activity examples. Upon performing several experiments, we set

.…”

Section: Proposed Methodologymentioning

confidence: 99%

“…For classification, we used a Convolutional Neural Network (CNN). Specifically, the architecture of the CNN that has been used throughout our experiments has been experimentally defined and was initially used in previous works [ 20 , 34 ]. It consists of a 2D convolutional layer that filters the

input image with five kernels of

size, a max-pooling layer that performs

subsampling, two consecutive convolutional layers of size

with 10 and 15 kernels, a max-pooling layer performing

subsampling, a flattened layer that transforms the output of the last pooling layer into a vector, which consists of the input to a dense layer upon applying a dropout layer with a dropout rate equal to

and a second dense layer producing the output of the network.…”

Section: Proposed Methodologymentioning

confidence: 99%

“…In previous work [ 11 ], we have extensively studied the effect of occlusion in the task of HAR. We relied on the HAR approach of Papadakis et al [ 20 ] and simulated occlusion upon removing one or more body parts from 3D human skeletons and during the whole activity. Throughout our experiments, we used a Convolutional Neural Network that had been trained without using any occluded samples.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Human Activity Recognition in the Presence of Occlusion

Vernikos

Spyropoulos

Spyrou

et al. 2023

Sensors

Self Cite

View full text Add to dashboard Cite

The presence of occlusion in human activity recognition (HAR) tasks hinders the performance of recognition algorithms, as it is responsible for the loss of crucial motion data. Although it is intuitive that it may occur in almost any real-life environment, it is often underestimated in most research works, which tend to rely on datasets that have been collected under ideal conditions, i.e., without any occlusion. In this work, we present an approach that aimed to deal with occlusion in an HAR task. We relied on previous work on HAR and artificially created occluded data samples, assuming that occlusion may prevent the recognition of one or two body parts. The HAR approach we used is based on a Convolutional Neural Network (CNN) that has been trained using 2D representations of 3D skeletal motion. We considered cases in which the network was trained with and without occluded samples and evaluated our approach in single-view, cross-view, and cross-subject cases and using two large scale human motion datasets. Our experimental results indicate that the proposed training strategy is able to provide a significant boost of performance in the presence of occlusion.

show abstract

.…”

Section: Proposed Methodologymentioning

confidence: 99%

input image with five kernels of

size, a max-pooling layer that performs

subsampling, two consecutive convolutional layers of size

with 10 and 15 kernels, a max-pooling layer performing

and a second dense layer producing the output of the network.…”

Section: Proposed Methodologymentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Human Activity Recognition in the Presence of Occlusion

Vernikos

Spyropoulos

Spyrou

et al. 2023

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…Annotated data are usually pre-processed with several cleaning methodologies prior to being used as input for an ML algorithm. This step may include, e.g., treating actions as signals and then using signal processing techniques to transform them into images [ 21 , 22 ], utilizing low-resolution RGB frames or cropping the central area of the frames [ 23 ] or even considering short- and long-term dependencies based on depth [ 24 ]. Then, ML/DL algorithms are applied to those data for action recognition.…”

Section: Related Workmentioning

confidence: 99%

A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization

Papadakis,

Spyrou

2024

Sensors

Self Cite

View full text Add to dashboard Cite

Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. In this work we propose a novel approach for domain-generalized egocentric human activity recognition. Typical approaches use a large amount of training data, aiming to cover all possible variants of each action. Moreover, several recent approaches have attempted to handle discrepancies between domains with a variety of costly and mostly unsupervised domain adaptation methods. In our approach we show that through simple manipulation of available source domain data and with minor involvement from the target domain, we are able to produce robust models, able to adequately predict human activity in egocentric video sequences. To this end, we introduce a novel three-stream deep neural network architecture combining elements of vision transformers and residual neural networks which are trained using multi-modal data. We evaluate the proposed approach using a challenging, egocentric video dataset and demonstrate its superiority over recent, state-of-the-art research works.

show abstract

“…There have always been many convoluted image-based visual tasks to settle in computer vision [49][50][51], such as image retrieval, image classification, semantic segmentation, image captioning, etc. However, the emergence of innovative computational models and learning algorithms has provided new approaches for solving those highly demanding tasks.…”

Section: The Proposed Research Frameworkmentioning

confidence: 99%

Study on Representation Invariances of CNNs and Human Visual Information Processing Based on Data Augmentation

et al. 2020

View full text Add to dashboard Cite

Representation invariance plays a significant role in the performance of deep convolutional neural networks (CNNs) and human visual information processing in various complicated image-based tasks. However, there has been abounding confusion concerning the representation invariance mechanisms of the two sophisticated systems. To investigate their relationship under common conditions, we proposed a representation invariance analysis approach based on data augmentation technology. Firstly, the original image library was expanded by data augmentation. The representation invariances of CNNs and the ventral visual stream were then studied by comparing the similarities of the corresponding layer features of CNNs and the prediction performance of visual encoding models based on functional magnetic resonance imaging (fMRI) before and after data augmentation. Our experimental results suggest that the architecture of CNNs, combinations of convolutional and fully-connected layers, developed representation invariance of CNNs. Remarkably, we found representation invariance belongs to all successive stages of the ventral visual stream. Hence, the internal correlation between CNNs and the human visual system in representation invariance was revealed. Our study promotes the advancement of invariant representation of computer vision and deeper comprehension of the representation invariance mechanism of human visual information processing.

show abstract

Recognizing Human Actions Using 3D Skeletal Information and CNNs

Cited by 14 publications

References 17 publications

Human Activity Recognition in the Presence of Occlusion

Human Activity Recognition in the Presence of Occlusion

A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization

Study on Representation Invariances of CNNs and Human Visual Information Processing Based on Data Augmentation

Contact Info

Product

Resources

About