Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks

Sun, Lin; Jia, Kui; Yeung, Dit–Yan; Shi, Bertram E.

doi:10.1109/iccv.2015.522

Cited by 516 publications

(348 citation statements)

References 36 publications

(67 reference statements)

Supporting

Mentioning

346

Contrasting

Unclassified

Order By: Relevance

“…Previous research on the use of deep learning for sleep science has been focused on PSG data [45,46]. In other application areas, deep learning has been used for human activity recognition [47,48] which is a similar technical problem. In a previous study, we combined human recognition of actigraphy data with other machine learning algorithms, but not deep learning [49].…”

Section: Discussionmentioning

confidence: 99%

Sleep Quality Prediction From Wearable Data Using Deep Learning

Sathyanarayana¹,

Joty²,

Fernández-Luque³

et al. 2016

JMIR Mhealth Uhealth

170

124

View full text Add to dashboard Cite

BackgroundThe importance of sleep is paramount to health. Insufficient sleep can reduce physical, emotional, and mental well-being and can lead to a multitude of health complications among people with chronic conditions. Physical activity and sleep are highly interrelated health behaviors. Our physical activity during the day (ie, awake time) influences our quality of sleep, and vice versa. The current popularity of wearables for tracking physical activity and sleep, including actigraphy devices, can foster the development of new advanced data analytics. This can help to develop new electronic health (eHealth) applications and provide more insights into sleep science.ObjectiveThe objective of this study was to evaluate the feasibility of predicting sleep quality (ie, poor or adequate sleep efficiency) given the physical activity wearable data during awake time. In this study, we focused on predicting good or poor sleep efficiency as an indicator of sleep quality.MethodsActigraphy sensors are wearable medical devices used to study sleep and physical activity patterns. The dataset used in our experiments contained the complete actigraphy data from a subset of 92 adolescents over 1 full week. Physical activity data during awake time was used to create predictive models for sleep quality, in particular, poor or good sleep efficiency. The physical activity data from sleep time was used for the evaluation. We compared the predictive performance of traditional logistic regression with more advanced deep learning methods: multilayer perceptron (MLP), convolutional neural network (CNN), simple Elman-type recurrent neural network (RNN), long short-term memory (LSTM-RNN), and a time-batched version of LSTM-RNN (TB-LSTM).ResultsDeep learning models were able to predict the quality of sleep (ie, poor or good sleep efficiency) based on wearable data from awake periods. More specifically, the deep learning methods performed better than traditional linear regression. CNN had the highest specificity and sensitivity, and an overall area under the receiver operating characteristic (ROC) curve (AUC) of 0.9449, which was 46% better as compared with traditional linear regression (0.6463).ConclusionsDeep learning methods can predict the quality of sleep based on actigraphy data from awake periods. These predictive models can be an important tool for sleep research and to improve eHealth solutions for sleep.

show abstract

Section: Discussionmentioning

confidence: 99%

Sleep Quality Prediction From Wearable Data Using Deep Learning

Sathyanarayana¹,

Joty²,

Fernández-Luque³

et al. 2016

JMIR Mhealth Uhealth

170

124

View full text Add to dashboard Cite

show abstract

“…In addition to ResNeXt-50 model, here we also train our model with the deeper ResNeXt-101 [75] and report its performance as well. In order to provide a fair comparison, we split the table into two parts, the ones incorporate their methods Method UCF101 HMDB51 CNN-hid6 [80] 79.3 -Comp-LSTM [62] 84.3 44.0 C3D+SVM [65] 85.2 -2S-CNN [78] 88.0 59.4 FSTCN [63] 88.1 59.1 2S-CNN+Pool [78] 88.2 -Objects+Motion(R * ) [26] 88.5 61.4 2S-CNN+LSTM [78] 88.6 -TDD [70] 90 [48] 86.0 60.1 FM+IDT [47] 87.9 61.1 MIFS+IDT [35] 89.1 65.1 CNN-hid6+IDT [80] 89.6 -C3D Ensemble+IDT (Sports-1M) [65] 90.1 -C3D+IDT+SVM [65] 90.4 -TDD+IDT [70] 91.5 65.9 Sympathy [9] 92.5 70.4 Two-Stream Fusion+IDT [15] 93.5 69.2 ST-ResNet+IDT [14] 94 [4] has been pre-trained on a large-scale video dataset, Kinetics300k.…”

Section: Dynamic Optical Flowmentioning

confidence: 99%

Action Recognition with Dynamic Image Networks

Bilen

Fernando

Gavves

et al. 2018

IEEE Trans. Pattern Anal. Mach. Intell.

187

164

View full text Add to dashboard Cite

Abstract-We introduce the concept of dynamic image, a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or optical flow videos by using the concept of 'rank pooling'. The idea is to learn a ranking machine that captures the temporal evolution of the data and to use the parameters of the latter as a representation. When a linear ranking machine is used, the resulting representation is in the form of an image, which we call dynamic because it summarizes the video dynamics in addition of appearance. This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos. We also present an efficient and effective approximate rank pooling operator, accelerating standard rank pooling algorithms by orders of magnitude, and formulate that as a CNN layer. This new layer allows generalizing dynamic images to dynamic feature maps. We demonstrate the power of the new representations on standard benchmarks in action recognition achieving state-of-the-art performance.

show abstract

“…There has been a great deal of progress in human activity recognition in video captured from a third-person viewpoint. Early work contributed handcraft features to feature representation for activity recognition [11,12,13,14,15,16,17,18]. Some studies suggested various methods, such as support vector machine (SVM) [19,20], unsupervised learning [21], and multi-label learning [22] to improve recognition performance.…”

Section: Related Workmentioning

confidence: 99%

Three-stream fusion network for first-person interaction recognition

Kim

Lee

2020

Pattern Recognition

View full text Add to dashboard Cite

First-person interaction recognition is a challenging task because of unstable video conditions resulting from the camera wearers movement. For human interaction recognition from a first-person viewpoint, this paper proposes a three-stream fusion network with two main parts: three-stream architecture and three-stream correlation fusion. The three-stream architecture captures the characteristics of the target appearance, target motion, and camera egomotion. Meanwhile the three-stream correlation fusion combines the feature map of each of the three streams to consider the correlations among the target appearance, target motion, and camera ego-motion. The fused feature vector is robust to the camera movement and compensates for the noise of the camera ego-motion. Short-term intervals are modeled using the fused feature vector, and a long short-term memory(LSTM) model considers the temporal dynamics of the video. We evaluated the proposed method on two public benchmark datasets to validate the effectiveness of our approach. The experimental results show that the proposed fusion method successfully generated a discriminative feature vector, and our network outperformed all competing activity recognition methods in first-person videos where considerable camera ego-motion occurs.

show abstract

Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks

Cited by 516 publications

References 36 publications

Sleep Quality Prediction From Wearable Data Using Deep Learning

Sleep Quality Prediction From Wearable Data Using Deep Learning

Action Recognition with Dynamic Image Networks

Three-stream fusion network for first-person interaction recognition

Contact Info

Product

Resources

About