“…Since an early study [9] show that mid and low-level facial attributes (e.g., Facial Action Units (AUs) and facial landmarks) are informative for depression status, a certain number of recent studies devote to recognize depression from automatic detected facial attributes such as facial landmarks [10], [11], [12], gaze direction [13], [14], facial action units (AUs) [15], [16], [17], and head poses [10]. Besides some of them compute several statistics [18], [19] (e.g., displacement, velocity, acceleration) from facial attributes time-series as the clip-level representation for depression recognition, recent advances in deep learning (e.g., 1D-CNN [16], [17], LSTM [20], attention-based temporal CNN [21], Causal CNN [22], etc.) also have been applied to infer depression from facial attribute time-series, and achieved enhanced results over most hand-crafted approaches.…”