Multimodal Spatiotemporal Representation for Automatic Depression Level Detection

Niu, Mingyue; Tao, Jianhua; Liu, Bin; Huang, Jian; Zheng, Liping

doi:10.1109/taffc.2020.3031345

Cited by 51 publications

(21 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [33], they tried LLDs of different lengths and suggested that 20s is sufficient to obtain a good performance. In [163], the authors sampled the waveforms at 8KHZ and generated the 129-dimensional normalized amplitude spectrogram using a short-time Fourier transform with 32 ms Hamming window and 16 ms frame shift for AVEC2013 and AVEC2014 databases.…”

Section: Preprocessingmentioning

confidence: 99%

Deep Learning for Depression Recognition with Audiovisual Cues: A Review

He¹,

Niu²,

Tiwari³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

With the acceleration of the pace of work and life, people have to face more and more pressure, which increases the possibility of suffering from depression. However, many patients may fail to get a timely diagnosis due to the serious imbalance in the doctor-patient ratio in the world. Promisingly, physiological and psychological studies have indicated some differences in speech and facial expression between patients with depression and healthy individuals. Consequently, to improve current medical care, many scholars have used deep learning to extract a representation of depression cues in audio and video for automatic depression detection. To sort out and summarize these works, this review introduces the databases and describes objective markers for automatic depression estimation (ADE). Furthermore, we review the deep learning methods for automatic depression detection to extract the representation of depression from audio and video. Finally, this paper discusses challenges and promising directions related to automatic diagnosing of depression using deep learning technologies.

show abstract

Section: Preprocessingmentioning

confidence: 99%

Deep Learning for Depression Recognition with Audiovisual Cues: A Review

He¹,

Niu²,

Tiwari³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Besides the handcrafted methods, De Melo et al [22] proposed to downsample the video into a small set of frames which roughly represent the video-level information and it was then fed to 3D CNNs to learn a video-level depression representation. Niu et al [25] proposed a spatio-temporal attention network to integrate the facial appearance and short-term facial dynamics. Then, the eigen-evolution pooling strategy is introduced to aggregate thin slice-level features into the video-level descriptor.…”

Section: Video-based Automatic Depression Analysismentioning

confidence: 99%

“…The main contributions and benefits of our approach in comparison with the existing depression recognition approaches are the following: (i). In contrast to existing single-stage approaches that either focuses on modelling depression at frame/thin slice-level [13], [14], [15], [16] or video-level [18], [22], [25], we propose a two-stage framework that takes advantage of both short-term and videolevel behaviours for depression recognition; (ii) the framework is designed so that it utilizes all available frames to predict depression, distinguishing it from other video-level modelling methods [22] that discard frames carrying crucial information; (iii). while widely-used C3D-based approaches [15], [22], [36] only learn depression features based on a single temporal scale, the proposed short-term depressive behaviour modelling stage can explicitly encode depressionrelated facial behaviour features at multiple temporal scales; (iv).…”

Section: The Proposed Two-stage Approachmentioning

confidence: 99%

“…the proposed Depression Feature Enhancement (DFE) module is the very first work that is designed to specifically enhance the depression-related cues and suppress the nondepression noise for the deep-learned features; and (v). Compared to other video-level modelling methods [18], [23], [25], [26], [28] that simply employ statistics (e.g., the average value of frame-level predictions) to summarize the predictions/features of all frames/thin slices, we propose the first work that learns a graph representation to represent the video-level depression-related facial behaviours.…”

Section: The Proposed Two-stage Approachmentioning

confidence: 99%

“…Since depression is a long-term mental state lasting much longer than the duration of a regular video (i.e., usually less than an hour [9], [10], [11], [20], [21]), many recent studies propose to infer depression based on the features that are extracted from an entire video [18], [22], [23], [24], [25], [26]. Most of these approaches surpass the performance of frame/thin slice-level modelling methods.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Two-stage Temporal Modelling Framework for Video-based Depression Recognition using Graph Representation

Xu¹,

Song²,

Kusumam³

et al. 2021

Preprint

View full text Add to dashboard Cite

Video-based automatic depression analysis provides a fast, objective and repeatable self-assessment solution, which has been widely developed in recent years. While depression clues may be reflected by human facial behaviours of various temporal scales, most existing approaches either focused on modelling depression from short-term or video-level facial behaviours. In this sense, we propose a two-stage framework that models depression severity from multi-scale short-term and video-level facial behaviours. The short-term depressive behaviour modelling stage first deep learns depression-related facial behavioural features from multiple short temporal scales, where a Depression Feature Enhancement (DFE) module is proposed to enhance the depression-related clues for all temporal scales and remove non-depression noises. Then, the video-level depressive behaviour modelling stage proposes two novel graph encoding strategies, i.e., Sequential Graph Representation (SEG) and Spectral Graph Representation (SPG), to re-encode all short-term features of the target video into a video-level graph representation, summarizing depression-related multi-scale video-level temporal information. As a result, the produced graph representations predict depression severity using both short-term and long-term facial beahviour patterns. The experimental results on AVEC 2013 and AVEC 2014 datasets show that the proposed DFE module constantly enhanced the depression severity estimation performance for various CNN models while the SPG is superior than other video-level modelling methods. More importantly, the result achieved for the proposed two-stage framework shows its promising and solid performance compared to widely-used one-stage modelling approaches.

show abstract

Review of automated depression detection: Social posts, audio and video, open challenges and future direction

Yadav

Sharma

Patil

2022

Concurrency and Computation

View full text Add to dashboard Cite

Depression is the primary cause of illness and injury in the country, with over 280 million people suffering from it as per the 2021 survey. Depression is one of the most common psychiatric illnesses in the world, affecting millions of people. Early detection of major depression symptoms and treatment with timely intervention can help to prevent the emergence of major depression. This has generated the need for some novel techniques to be utilized for depression detection to help doctors in diagnosing and treating depression effectively. Depression can be investigated through online posts, audio files, facial expressions, as well as through video files. In this regard, the research presents a detailed survey of the existing machine learning methods in depression detection, along with the different datasets available. The research explores automated depression detection strategies and the various ways to detect depression from text, audio, and video, and even examines the various systems and procedures for detecting depression using various criteria. More than 140 related research articles were considered for this review, with 80 of them being processed for comparative analysis based on various key performance indicators. The classifications of various methods used to detect stress, anxiety, and depression are highlighted followed by open research challenges and issues while analyzing depression techniques.

show abstract

Multimodal Spatiotemporal Representation for Automatic Depression Level Detection

Cited by 51 publications

References 56 publications

Deep Learning for Depression Recognition with Audiovisual Cues: A Review

Deep Learning for Depression Recognition with Audiovisual Cues: A Review

Two-stage Temporal Modelling Framework for Video-based Depression Recognition using Graph Representation

Review of automated depression detection: Social posts, audio and video, open challenges and future direction

Contact Info

Product

Resources

About