A review of Convolutional-Neural-Network-based action recognition

Yao, Guangle; Leí, Tao; Zhong, Jiandan

doi:10.1016/j.patrec.2018.05.018

Cited by 278 publications

(117 citation statements)

References 27 publications

Supporting

Mentioning

113

Contrasting

Unclassified

Order By: Relevance

“…Human action recognition [1][2][3][4][5][6] is one of the most important research fields in computer vision. Although recognizing the motion of human action in video can provide discriminative clues for classifying one specific action, many human actions (e.g., "Phoning," "InteractingWithComputer," and "Shooting," as shown in Figure 1), can be represented by one single still image [2]. In particular, certain actions (e.g., "Play-ingGuitar," "RidingHorse," and "Running," as shown in Figure 1) may require static cue-based approaches even if those motions in videos are available [2].…”

Section: Introductionmentioning

confidence: 99%

“…Although recognizing the motion of human action in video can provide discriminative clues for classifying one specific action, many human actions (e.g., "Phoning," "InteractingWithComputer," and "Shooting," as shown in Figure 1), can be represented by one single still image [2]. In particular, certain actions (e.g., "Play-ingGuitar," "RidingHorse," and "Running," as shown in Figure 1) may require static cue-based approaches even if those motions in videos are available [2]. To recognize these human actions with video-based approaches mentioned above [5,6,8] may be inappropriate due to their slight action changes without distinguishability.…”

Section: Introductionmentioning

confidence: 99%

“…To recognize these human actions with video-based approaches mentioned above [5,6,8] may be inappropriate due to their slight action changes without distinguishability. Its static features by nature motivate us to address those human action recognition tasks in still images [2]. Classifying human actions in still images is a more challenging task, especially when only one single image is available along with disturbance and cluttered background.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Deep Ensemble Learning for Human Action Recognition in Still Images

Zhang

et al. 2020

Complexity

View full text Add to dashboard Cite

Numerous human actions such as “Phoning,” “PlayingGuitar,” and “RidingHorse” can be inferred by static cue-based approaches even if their motions in video are available considering one single still image may already sufficiently explain a particular action. In this research, we investigate human action recognition in still images and utilize deep ensemble learning to automatically decompose the body pose and perceive its background information. Firstly, we construct an end-to-end NCNN-based model by attaching the nonsequential convolutional neural network (NCNN) module to the top of the pretrained model. The nonsequential network topology of NCNN can separately learn the spatial- and channel-wise features with parallel branches, which helps improve the model performance. Subsequently, in order to further exploit the advantage of the nonsequential topology, we propose an end-to-end deep ensemble learning based on the weight optimization (DELWO) model. It contributes to fusing the deep information derived from multiple models automatically from the data. Finally, we design the deep ensemble learning based on voting strategy (DELVS) model to pool together multiple deep models with weighted coefficients to obtain a better prediction. More importantly, the model complexity can be reduced by lessening the number of trainable parameters, thereby effectively mitigating overfitting issues of the model in small datasets to some extent. We conduct experiments in Li’s action dataset, uncropped and 1.5x cropped Willow action datasets, and the results have validated the effectiveness and robustness of our proposed models in terms of mitigating overfitting issues in small datasets. Finally, we open source our code for the model in GitHub (https://github.com/yxchspring/deep_ensemble_learning) in order to share our model with the community.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Deep Ensemble Learning for Human Action Recognition in Still Images

Zhang

et al. 2020

Complexity

View full text Add to dashboard Cite

show abstract

“…Video content representation, that is feature extraction, is the core of video action recognition [Yao, Lei and Zhong (2019)]. Then, whether the feature extraction and effective characterization of the video content can be better realized will directly determine the motion recognition effect.…”

Section: Deep Featurementioning

confidence: 99%

Research on Action Recognition and Content Analysis in videos based on DNN and MLN

Song

Zhao

et al. 2019

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

In the current era of multimedia information, it is increasingly urgent to realize intelligent video action recognition and content analysis. In the past few years, video action recognition, as an important direction in computer vision, has attracted many researchers and made much progress. First, this paper reviews the latest video action recognition methods based on Deep Neural Network and Markov Logic Network. Second, we analyze the characteristics of each method and the performance from the experiment results. Then compare the emphases of these methods and discuss the application scenarios. Finally, we consider and prospect the development trend and direction of this field.

show abstract

“…As said in (G Yao et al, 2019), we have two ways for representation of action recognition; the Handcrafted representation method (Caba Heilbron et al, 2016, Mettes et al, 2015, Yu, 2015, and the Deep Learning representation method (I. Goodfellow, 2016); in the first method, we extract features manually and it is generally used as a baseline to evaluate new Deep Learning representation; whereas, the deep learning representation method learns the trainable features automatically from videos (G Yao et al, 2019). So, talking about automatic human events recognition recently seems that researchers in this area aim to go beyond the human spirit.…”

Section: Introductionmentioning

confidence: 99%

Events Recognition for a Semi-Automatic Annotation of Soccer Videos: A Study Based Deep Learning

Tani

Ghomari

Tani

2019

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.

View full text Add to dashboard Cite

<p><strong>Abstract.</strong> In this work, we propose an efficient way of web video annotation in soccer domain. To achieve this, it is necessary to enjoy different architectures of deep learning. We aim at realising a system of annotation able to recognise several events from information of the object that is the ball in our case, in order to fuse them as a part of actions in video. We propose to use Deep Neural Network (DNN) to detect ball and actions. However, Mask R-CNN can play a very important role for features extracted as an output using a training network on ImageNet dataset. The Mask R-CNN is chosen as a method using different CNN as backbone (convolutional Neural Network) ResNet50, ResNet101 and ResNet152, VGG16, VGG 19. We experimentally verify the effectiveness of the proposed method in the test phase.</p>

show abstract

A review of Convolutional-Neural-Network-based action recognition

Cited by 278 publications

References 27 publications

Deep Ensemble Learning for Human Action Recognition in Still Images

Deep Ensemble Learning for Human Action Recognition in Still Images

Research on Action Recognition and Content Analysis in videos based on DNN and MLN

Events Recognition for a Semi-Automatic Annotation of Soccer Videos: A Study Based Deep Learning

Contact Info

Product

Resources

About