2023
DOI: 10.1109/taffc.2020.3031345
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Spatiotemporal Representation for Automatic Depression Level Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 51 publications
(21 citation statements)
references
References 56 publications
0
16
0
Order By: Relevance
“…In [33], they tried LLDs of different lengths and suggested that 20s is sufficient to obtain a good performance. In [163], the authors sampled the waveforms at 8KHZ and generated the 129-dimensional normalized amplitude spectrogram using a short-time Fourier transform with 32 ms Hamming window and 16 ms frame shift for AVEC2013 and AVEC2014 databases.…”
Section: Preprocessingmentioning
confidence: 99%
“…In [33], they tried LLDs of different lengths and suggested that 20s is sufficient to obtain a good performance. In [163], the authors sampled the waveforms at 8KHZ and generated the 129-dimensional normalized amplitude spectrogram using a short-time Fourier transform with 32 ms Hamming window and 16 ms frame shift for AVEC2013 and AVEC2014 databases.…”
Section: Preprocessingmentioning
confidence: 99%
“…Besides the handcrafted methods, De Melo et al [22] proposed to downsample the video into a small set of frames which roughly represent the video-level information and it was then fed to 3D CNNs to learn a video-level depression representation. Niu et al [25] proposed a spatio-temporal attention network to integrate the facial appearance and short-term facial dynamics. Then, the eigen-evolution pooling strategy is introduced to aggregate thin slice-level features into the video-level descriptor.…”
Section: Video-based Automatic Depression Analysismentioning
confidence: 99%
“…The main contributions and benefits of our approach in comparison with the existing depression recognition approaches are the following: (i). In contrast to existing single-stage approaches that either focuses on modelling depression at frame/thin slice-level [13], [14], [15], [16] or video-level [18], [22], [25], we propose a two-stage framework that takes advantage of both short-term and videolevel behaviours for depression recognition; (ii) the framework is designed so that it utilizes all available frames to predict depression, distinguishing it from other video-level modelling methods [22] that discard frames carrying crucial information; (iii). while widely-used C3D-based approaches [15], [22], [36] only learn depression features based on a single temporal scale, the proposed short-term depressive behaviour modelling stage can explicitly encode depressionrelated facial behaviour features at multiple temporal scales; (iv).…”
Section: The Proposed Two-stage Approachmentioning
confidence: 99%
See 2 more Smart Citations