2019
DOI: 10.1049/iet-cvi.2018.5088
|View full text |Cite
|
Sign up to set email alerts
|

Multi‐stream 3D CNN structure for human action recognition trained by limited data

Abstract: Here, the authors proposed a solution to improve the training performance in limited training data case for human action recognition. The authors proposed three different convolutional neural network (CNN) architectures for this purpose. At first, the authors generated four different channels of information by optical flows and gradients in the horizontal and vertical directions from each frame to apply to three-dimensional (3D) CNNs. Then, the authors proposed three architectures, which are single-stream, two… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(24 citation statements)
references
References 32 publications
0
24
0
Order By: Relevance
“…As shown in Figure 4, after inputting the action feature vector obtained from the action video frame sequence into the self-organizing mapping network (SOM), through competitive learning, the trained weights of neurons in the competitive layer can be obtained [23]. e Euclidean distance between the weights of different neurons in the competitive layer of the feature vector and SOM neural network is obtained [24][25][26][27][28][29][30][31]. According to the Euclidean distance, the feature vectors are classified into different neurons; the last neuron is traversed to find out the nearest feature vector of each non-control neuron as the key frame [32][33][34][35][36][37][38][39].…”
Section: Human Action Recognition Based On Voting Strategy Of Multi-feature Classification Results Combined With Sommentioning
confidence: 99%
“…As shown in Figure 4, after inputting the action feature vector obtained from the action video frame sequence into the self-organizing mapping network (SOM), through competitive learning, the trained weights of neurons in the competitive layer can be obtained [23]. e Euclidean distance between the weights of different neurons in the competitive layer of the feature vector and SOM neural network is obtained [24][25][26][27][28][29][30][31]. According to the Euclidean distance, the feature vectors are classified into different neurons; the last neuron is traversed to find out the nearest feature vector of each non-control neuron as the key frame [32][33][34][35][36][37][38][39].…”
Section: Human Action Recognition Based On Voting Strategy Of Multi-feature Classification Results Combined With Sommentioning
confidence: 99%
“…is method segmented the image in advance and then extracted the action features, so as to obtain the Gaussian distribution model of the action image background and realize the extraction of human action, but the feature extraction clustering of this method is poor. Literature [9] proposed a human motion recognition method based on 3D CNN, which encodes the motion information to recognize and extract the motion. However, the stability of motion feature extraction is low.…”
Section: Related Workmentioning
confidence: 99%
“…The BinaryDataCost function of (6) consists of the sum of the delta function. As shown in (7), calculating the delta function simply yields 0 and 1 outputs, and is faster than calculating L1 norm. This modified data term reduces the computational cost than conventional SIFT flow but maintains the performance of human action recognition.…”
Section: Modified Energy Functionmentioning
confidence: 99%
“…However, real videos of human actions do not satisfy the above assumption, especially in the ADAS system and in autonomous vehicles, which results in the optical flow-based two-stream CNN having low performance for human action recognition. To overcome this limitation of optical flow, many researchers have modified the two-stream CNN network model [13,37] with a long short-term memory (LSTM) [12,45], a complicated pooling layer [14,42,46,47,49], or more than three stream networks [7,28,39,44]. Although the modifications improve their action recognition accuracy, the modified optical-flow two-stream CNNs have been found in recent research to have the same limitations [31].…”
Section: Introductionmentioning
confidence: 99%