2021 18th International Multi-Conference on Systems, Signals &Amp; Devices (SSD) 2021
DOI: 10.1109/ssd52085.2021.9429429
|View full text |Cite
|
Sign up to set email alerts
|

3D CNN for Human Action Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…Therefore, 3D CNNs can learn spatiotemporal features for continuous frame data. However, 3D CNNs have the disadvantage of requiring a large amount of computation for training due to large frame datasets and 3D convolutional filters [15].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, 3D CNNs can learn spatiotemporal features for continuous frame data. However, 3D CNNs have the disadvantage of requiring a large amount of computation for training due to large frame datasets and 3D convolutional filters [15].…”
Section: Related Workmentioning
confidence: 99%
“…It computes the center x of x 1 and x 2 and the center y of y 1 and y 2 , returning the coordinates x, y. Euclidean_Distance calculates the Euclidean distance between the centroid coordinates of detected individuals (lines 9-11). Grouping clusters adjacent centroid coordinates (lines [12][13][14][15][16][17][18][19]. Depending on the number of individuals detected in the current frame, the Euclidean distance of the centroid coordinates is calculated.…”
Section: Object Grouping Algorithmmentioning
confidence: 99%
“…Others have gone one step further by proposing 3D architectures that include a variety of hidden layers, consisting of 3D convolutional, dropout, 3D pooling and fully connected layers for spatio-temporal consideration of third-person perspective video data (Almaadeed et al 2019;Basha, Pulabaigari, et al, 2020;Boualia and Amara, 2021;Wan et al 2020). These state-of-the-art models have reached classification performances of over 90% on well-known datasets such as UFC101 (Soomro et al 2012), HMDB51 (Kuehne et al 2011) and KTH (Schüldt et al 2004).…”
Section: Action Recognitionmentioning
confidence: 99%
“…The majority of these video-based HAR methods use a combination of convolutional neural networks (CNNs) and recurrent neural networks (RNN) to extract image-based features, as well as to leverage temporal relationships between video frames (Basha et al 2020a, b;Dai et al 2020;Garcia-Hernando et al 2018;Kapidis et al 2021;Ng et al 2015). When trained with large quantities of training data from publicly available action recognition datasets, neural network-based models have been able to reach recognition accuracies of over 90% for everyday actions (Boualia and Amara, 2021;Wan et al 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Boualia and Amara [29] submitted video image frames to a 3D CNN to describe the action involved without preprocessing or feature extraction. Mishra et al [30] identified regions of interest in the input video image frames, then used MHI and motion energy images to extract features.…”
Section: State Of the Artmentioning
confidence: 99%