Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge 2017
DOI: 10.1145/3133944.3133948
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Measurement of Depression Using Deep Learning Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
89
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 119 publications
(93 citation statements)
references
References 20 publications
3
89
1
Order By: Relevance
“…However, deep learning techniques that learn discriminant feature representations from training data are considered to be stateof-the-art in depression recognition. Deep learning models for video-based depression detection often cascade a 2D CNN with an RNN [17]. Most notably, Jan et al [38] propose deep learning techniques to extract features from facial frames and employ feature dynamic history histogram (FDHH) to capture variations in the features.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, deep learning techniques that learn discriminant feature representations from training data are considered to be stateof-the-art in depression recognition. Deep learning models for video-based depression detection often cascade a 2D CNN with an RNN [17]. Most notably, Jan et al [38] propose deep learning techniques to extract features from facial frames and employ feature dynamic history histogram (FDHH) to capture variations in the features.…”
Section: Related Workmentioning
confidence: 99%
“…Deep learning architectures, and in particular CNNs, provide state-of-the-art performance in many visual recognition applications, such as image classification [14] and object detection [15], as well as assisted medical diagnosis [16]. In depression detection, deep learning architectures that process on videos typically exploit spatial and temporal information separately (e.g., by cascading a 2D CNN and then a recurrent NN), which deteriorate the modeling of spatio-temporal relationships [11], [17]. A deep two-stream architecture has also been proposed to exploit facial appearance and facial optical flow [10].…”
Section: Introductionmentioning
confidence: 99%
“…In a similar spirit, and Sun et al modeled depression by structuring questions and responses in the form a decision tree [7,8], while Gong et al developed an ensemble of audio, text, and video features as a function of the question type asked [9]. Utilizing a deep learning framework, Yang et al (2017) combined multiple modalities conditioned on manually selected questions [10].…”
Section: Introductionmentioning
confidence: 99%
“…Although the use of different scales for training (e.g., Yang et al 2017 used PHQ-8 scores) can introduce potential confounds for performance comparisons, we scaled our model's MAE and RMSE values to obtain an error percentage. When we compared error percentages, we found that our model fared better in terms of MAE compared to previous models and performed comparably with respect to RMSE percentage (L. Yang et al 2017). It is noteworthy that this difference in performance might arise because of a difference in the distribution of the original scores (whether the datasets contain more depressed participants).…”
Section: Resultsmentioning
confidence: 68%
“…In parallel, ML techniques have been applied to examine affective display differences exhibited during emotion states, such as facial expression and vocal prosody, through audio and video-based analysis. These advances have generated a new field of research which has successfully used ML techniques, such as support vector machines (Cohn et al 2009), regression (Valstar et al 2013), and neural networks (shallow and deep; (L. Yang et al 2017)), for automatic recognition of emotion using audiovisual data from conventional databases (Schuller, Steidl, and Batliner 2009;Burkhardt et al 2005) and recently more naturalistic environments (Dhall et al 2013;McKeown et al 2012;Ringeval et al 2013). Moreover, ML has also been extended to investigate verbal and nonverbal affective abnormalities associated with psychiatric disorders and has gone on to successfully classify those presenting with and without a given diagnosis (Hamm et al 2011;P.…”
Section: Introductionmentioning
confidence: 99%