Two-stream RNN/CNN for action recognition in 3D videos

Zhao, Rui; Ali, Haider; Smagt, Patrick van der

doi:10.1109/iros.2017.8206288

Cited by 86 publications

(57 citation statements)

References 64 publications

(65 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Usually, a first stage of convolutional layers extract features from the raw data and generate high-level representations in deeper layers, then a second stage of recurrent layers uses the features yielded by the convolutional layers to learn time dependencies. Examples of applications are action recognition in videos (Sainath et al, 2015;Donahue et al, 2017;Zhao et al, 2017a) and speech recognition (Zhao et al, 2017b).…”

Section: Introductionmentioning

confidence: 99%

Deep Learning for Image Sequence Classification of Astronomical Events

Carrasco-Davis

Cabrera-Vives

Förster

et al. 2019

PASP

View full text Add to dashboard Cite

We propose a new sequential classification model for astronomical objects based on a recurrent convolutional neural network (RCNN) which uses sequences of images as inputs. This approach avoids the computation of light curves or difference images. This is the first time that sequences of images are used directly for the classification of variable objects in astronomy. The second contribution of this work is the image simulation process. We generate synthetic image sequences that take into account the instrumental and observing conditions, obtaining a realistic, unevenly sampled, and variable noise set of movies for each astronomical object. The simulated dataset is used to train our RCNN classifier. This approach allows us to generate datasets to train and test our RCNN model for different astronomical surveys and telescopes. Moreover, using a simulated dataset is faster and more adaptable to different surveys and classification tasks. We aim at building a simulated dataset whose distribution is close enough to the real dataset, so that a fine tuning could match the distributions and solve the domain adaptation problem. To test the RCNN classifier trained with the synthetic dataset, we used real-world data from the High cadence Transient Survey (HiTS) obtaining an average recall of 85%, improved to 94% after performing fine tuning with 10 real samples per class. We compare the results of our RCNN model with those of a light curve random forest classifier. The proposed RCNN with fine tuning has a similar performance on the HiTS dataset compared to the light curve random forest classifier, trained on an augmented training set with 10 real samples per class. The RCNN approach presents several advantages in an alert stream classification scenario, such as a reduction of the data pre-processing, faster online evaluation and easier performance improvement using a few real data samples. The results obtained encourage us to use the proposed method for astronomical alert brokers systems that will process alert streams generated by new telescopes such as the Large Synoptic Survey Telescope.

show abstract

Section: Introductionmentioning

confidence: 99%

Deep Learning for Image Sequence Classification of Astronomical Events

Carrasco-Davis

Cabrera-Vives

Förster

et al. 2019

PASP

View full text Add to dashboard Cite

show abstract

“…In such architectures, spatial information is extracted though CNNs and is then passed to recurrent networks for learning the temporal characteristics of each interaction class [6,27]. Zhao et al [170] proposed an approach based on the normalization of each layer of the network with batch normalization [57]. The created architecture is combined with a 3-dimensional ConvNet by using a two-stream fusion of the RNN and ConvNet, with an SVM.…”

Section: Recurrent Networkmentioning

confidence: 99%

Analyzing human–human interactions: A survey

Stergiou

Poppe

2019

Computer Vision and Image Understanding

View full text Add to dashboard Cite

Many videos depict people, and it is their interactions that inform us of their activities, relation to one another and the cultural and social setting. With advances in human action recognition, researchers have begun to address the automated recognition of these human-human interactions from video. The main challenges stem from dealing with the considerable variation in recording settings, the appearance of the people depicted and the performance of their interaction. This survey provides a summary of these challenges and datasets, followed by an in-depth discussion of relevant vision-based recognition and detection methods. We focus on recent, promising work based on convolutional neural networks (CNNs). Finally, we outline directions to overcome the limitations of the current state-of-the-art. Main challenges in the fieldWe identify challenges when dealing with the visual and structural aspects of interaction videos. Additionally, we outline practical challenges in the development of methods of automated human-human action recognition.

show abstract

“…There are two methods for recognizing specific user states such as falling and tripping using the obtained joint information. First, a study on action recognition in 3D video [30] and a study on skeleton extraction [31] implemented a deep-learning model that can recognize falling. However, to recognize motions, it can be challenging to crop only the target motion and use it as an input.…”

Section: (A)mentioning

confidence: 99%

Deep-cARe: Projection-Based Home Care Augmented Reality System with Deep Learning for Elderly

Park

Lee

et al. 2019

Applied Sciences

View full text Add to dashboard Cite

Developing innovative and pervasive smart technologies that provide medical support and improve the welfare of the elderly has become increasingly important as populations age. Elderly people frequently experience incidents of discomfort in their daily lives, including the deterioration of cognitive and memory abilities. To provide auxiliary functions and ensure the safety of the elderly in daily living situations, we propose a projection-based augmented reality (PAR) system equipped with a deep-learning module. In this study, we propose three-dimensional space reconstruction of a pervasive PAR space for the elderly. In addition, we propose the application of a deep-learning module to lay the foundation for contextual awareness. Performance experiments were conducted for grafting the deep-learning framework (pose estimation, face recognition, and object detection) onto the PAR technology through the proposed hardware for verification of execution possibility, real-time execution, and applicability. The precision of the face pose is particularly high by pose estimation; it is used to determine an abnormal user state. For face recognition results of whole class, the average detection rate (DR) was 74.84% and the precision was 78.72%. However, for face occlusions, the average DR was 46.83%. It was confirmed that the face recognition can be performed properly if the face occlusion situation is not frequent. By object detection experiment results, the DR increased as the distance from the system decreased for a small object. For a large object, the miss rate increased when the distance between the object and the system decreased. Scenarios for supporting the elderly, who experience degradation in movement and cognitive functions, were designed and realized, constructed using the proposed platform. In addition, several user interfaces (UI) were implemented according to the scenarios regardless of distance between users and the proposed system. In this study, we developed a bidirectional PAR system that provides the relevant information by understanding the user environment and action intentions instead of a unidirectional PAR system for simple information provision. We present a discussion of the possibility of care systems for the elderly through the fusion of PAR and deep-learning frameworks. Appl. Sci. 2019, 9, 3897 2 of 22 were usually attended to by their families; however, they are currently forced to lead independent lives [2]. There has been increasing interest in social welfare and medical support owing to the increasing aging rate. The social government is increasing social welfare and medical benefits to support elderly self-sustenance; however, most elderly people attend to various problems in their daily life on them own. Moreover, various diseases as well as physical and mental deterioration can occur with aging [3]. Muscle weakness is a representative example of physical deterioration. Muscle strength of people in their 60's decreases to approximately one half of that in their 20s, resulting in sign...

show abstract

Two-stream RNN/CNN for action recognition in 3D videos

Cited by 86 publications

References 64 publications

Deep Learning for Image Sequence Classification of Astronomical Events

Deep Learning for Image Sequence Classification of Astronomical Events

Analyzing human–human interactions: A survey

Deep-cARe: Projection-Based Home Care Augmented Reality System with Deep Learning for Elderly

Contact Info

Product

Resources

About