2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.721
|View full text |Cite
|
Sign up to set email alerts
|

Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals

Abstract: Physiological signals such as heart rate can provide valuable information about an individual's state and activity. However, existing work on computer vision has not yet explored leveraging these signals to enhance egocentric video understanding. In this work, we propose a model for reasoning on multimodal data to jointly predict activities and energy expenditures. We use heart rate signals as privileged self-supervision to derive energy expenditure in a training stage. A multitask objective is used to jointly… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
51
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 49 publications
(51 citation statements)
references
References 60 publications
(72 reference statements)
0
51
0
Order By: Relevance
“…Human activity understanding plays an important role in achieving intelligent systems. Different datasets have been proposed [10,31,15,25] to address the limitations in earlier works. Note that our data can enable research in learning driver behaviors as mentioned in the introduction.…”
Section: Related Workmentioning
confidence: 99%
“…Human activity understanding plays an important role in achieving intelligent systems. Different datasets have been proposed [10,31,15,25] to address the limitations in earlier works. Note that our data can enable research in learning driver behaviors as mentioned in the introduction.…”
Section: Related Workmentioning
confidence: 99%
“…Interestingly, they still find it useful to apply their model to multiple optical flow fields and fuse the results with the RGB stream. Some other works use recurrent approaches to model the actions in video [8,18,19,17] or even a single CNN [11]. Donahue et al [8] propose the Long-term Recurrent Convolutional Networks model that combines the CNN features from multiple frames using an LSTM to recognize actions.…”
Section: Action Recognitionmentioning
confidence: 99%
“…The Database for Emotion Analysis using Physiological signals (DEAP) dataset [17] and the MAHNOB-HCI dataset [18], which sensed EEG signals with facial videos when humans feel emotions, were released in 2012. Since then, many EEG-based emotion recognition algorithms have been developed [6,7,8,52,53].…”
Section: B Eeg-based Emotion Recognitionmentioning
confidence: 99%
“…As shown in Fig. 1, recent multimodal deep learning networks are based on a fully-connected (FC) layer structure, adopting an approach of concatenating input modalities or intermediate features for delivery to the next layer [18].…”
Section: Multimodal Deep Learningmentioning
confidence: 99%