Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras

Jin, Chan; Li, Shengzhe; Dung, Trung; Kim, Hakil

doi:10.1007/978-3-319-24078-7_33

Cited by 24 publications

(9 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the main drawback of this network is the difficulty and time consumption for training in comparison to convolutional neural networks (CNN) [65]. Additionally, current researches showed that CNN has a great performance for image processing in real time situations [26,65,[72][73][74], where the input data are much more complicated than 1D time series signals. As proposed in [65], a 1D-CNN, named CollisionNet, has a proper potential in detecting collision, although only incidental contacts have been considered.…”

Section: Contact Type Detectionmentioning

confidence: 99%

A Mixed-Perception Approach for Safe Human–Robot Collaboration in Industrial Automation

Amin

Rezayati

Venn

et al. 2020

Sensors

View full text Add to dashboard Cite

Digital-enabled manufacturing systems require a high level of automation for fast and low-cost production but should also present flexibility and adaptiveness to varying and dynamic conditions in their environment, including the presence of human beings; however, this presence of workers in the shared workspace with robots decreases the productivity, as the robot is not aware about the human position and intention, which leads to concerns about human safety. This issue is addressed in this work by designing a reliable safety monitoring system for collaborative robots (cobots). The main idea here is to significantly enhance safety using a combination of recognition of human actions using visual perception and at the same time interpreting physical human–robot contact by tactile perception. Two datasets containing contact and vision data are collected by using different volunteers. The action recognition system classifies human actions using the skeleton representation of the latter when entering the shared workspace and the contact detection system distinguishes between intentional and incidental interactions if physical contact between human and cobot takes place. Two different deep learning networks are used for human action recognition and contact detection, which in combination, are expected to lead to the enhancement of human safety and an increase in the level of cobot perception about human intentions. The results show a promising path for future AI-driven solutions in safe and productive human–robot collaboration (HRC) in industrial automation.

show abstract

Section: Contact Type Detectionmentioning

confidence: 99%

A Mixed-Perception Approach for Safe Human–Robot Collaboration in Industrial Automation

Amin

Rezayati

Venn

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…The advantage of these methods is that they are simple, fast, and efficient in controlled environments, for instance, when the background of the surveillance video (from a top-view camera) is always static. The fatal flaw in MHI is that it cannot capture interior motions-it can only capture human shapes [12]. In our work, a novel method for encoding these temporal features is proposed, and a study of how many appearance-based temporal features affect performance is provided.…”

Section: Related Workmentioning

confidence: 99%

“…Many works have been studied to estimate human pose [7][8][9][10] and analyze motion information [11] in real time. However, to the best of our knowledge, the real-time multilevel action descriptor was first introduced by the authors in [12] and this work is the extended version by adding two new actions, bicycling and phoning, and the evaluation of the processing time.…”

Section: Introductionmentioning

confidence: 99%

Real-Time Action Recognition Using Multi-level Action Descriptor and DNN

Jin¹,

Dung²,

Liu³

et al. 2019

Intelligent Video Surveillance

Self Cite

View full text Add to dashboard Cite

This work presents a novel approach to the problem of real-time human action recognition in intelligent video surveillance. For more efficient and precise labeling of an action, this work proposes a multilevel action descriptor, which delivers complete information of human actions. The action descriptor consists of three levels: posture, locomotion, and gesture level; each of which corresponds to a different group of subactions describing a single human action, for example, smoking while walking. The proposed action recognition method is able to localize and recognize simultaneously the actions of multiple individuals using appearance-based temporal features with multiple convolutional neural networks (CNN). Although appearance cues have been successfully exploited for visual recognition problems, appearance, motion history, and their combined cues with multi-CNNs have not yet been explored. Additionally, the first systematic estimation of several hyperparameters for shape and motion history cues is investigated. The proposed approach achieves a mean average precision (mAP) of 73.2% in the frame-based evaluation over the newly collected large-scale ICVL video dataset. The action recognition model can run at around 25 frames per second, which is suitable for real-time surveillance applications.

show abstract

“…Nowadays, deep learning is a hot topic in machine learning, and CNN is one of deep learning methods, which can learn hierarchical features from low-level data [12]. Xia et al proposed a robust and effective facial occlusion detection method based on CNN and multi-task learning [13].…”

Section: Related Workmentioning

confidence: 99%

Face Occlusion Detection Using Skin Color Ratio and LBP Features for Intelligent Video Surveillance Systems

Kim

Yang

2016

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

Abstract-A face occlusion detection scheme which is based on both skin color ratio (SCR) and Local Binary Pattern (LBP) feature, is proposed. The proposed method mainly consists of four steps: foreground extraction, head detection, feature extraction, and occlusion detection. First, foreground is extracted by codebook background subtraction algorithm. Then, the head region is located using HOG head detector. After that, the skin-color ratio and LBP feature are extracted. Finally, SVM is trained based on LBP feature. The recognition result of SVM and the result of skin-color ratio feature are merged by weighted voting strategy, and then occluded faces are classified as three categories: concealed, partially concealed, and visible. Experimental results show that the proposed detection system can achieve desirable results in intelligent video surveillance systems.

show abstract

Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras

Cited by 24 publications

References 16 publications

A Mixed-Perception Approach for Safe Human–Robot Collaboration in Industrial Automation

A Mixed-Perception Approach for Safe Human–Robot Collaboration in Industrial Automation

Real-Time Action Recognition Using Multi-level Action Descriptor and DNN

Face Occlusion Detection Using Skin Color Ratio and LBP Features for Intelligent Video Surveillance Systems

Contact Info

Product

Resources

About