Multi-Objective Based Spatio-Temporal Feature Representation Learning Robust to Expression Intensity Variations for Facial Expression Recognition

Kim, Dae Hoe; Baddar, Wissam J.; Jang, Jongsoo; Ro, Yong Man

doi:10.1109/taffc.2017.2695999

Cited by 189 publications

(119 citation statements)

References 37 publications

Supporting

Mentioning

108

Contrasting

Unclassified

Order By: Relevance

“…• Learning temporal feature representations. To learn representations of the temporal dynamics found in audio [63], sequences of images [64], and physiological measurements [55], DNNs and especially RNNs are successfully applied.…”

Section: Towards Learning Deep Models Of Affectmentioning

confidence: 99%

“…More recent studies combine RNNs with deep methods for spatial feature learning discussed in Section 3.1.1, by adopting deep features from the last layer of a CNN trained for affect recognition (e.g., [106], [18], [64]). We see both CNN-RNN (e.g., [18], [90]) and CNN-LSTM (e.g., [108], [105], [64]) architectures, with CNN-LSTM being the more frequent choice among the reviewed studies. Global temporal modeling is found to lead to improved accuracies when compared with simpler methods such as pooling of spatial features (e.g., [152], [90], [108]).…”

Section: Learning Temporal Features For Fermentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning for Human Affect Recognition: Insights and New Developments

Rouast

Adam

Chiong

2021

IEEE Trans. Affective Comput.

151

View full text Add to dashboard Cite

Automatic human affect recognition is a key step towards more natural human-computer interaction. Recent trends include recognition in the wild using a fusion of audiovisual and physiological sensors, a challenging setting for conventional machine learning algorithms. Since 2010, novel deep learning algorithms have been applied increasingly in this field. In this paper, we review the literature on human affect recognition between 2010 and 2017, with a special focus on approaches using deep neural networks. By classifying a total of 950 studies according to their usage of shallow or deep architectures, we are able to show a trend towards deep learning. Reviewing a subset of 233 studies that employ deep neural networks, we comprehensively quantify their applications in this field. We find that deep learning is used for learning of (i) spatial feature representations, (ii) temporal feature representations, and (iii) joint feature representations for multimodal sensor data. Exemplary state-of-the-art architectures illustrate the progress. Our findings show the role deep architectures will play in human affect recognition, and can serve as a reference point for researchers working on related applications.

show abstract

Section: Towards Learning Deep Models Of Affectmentioning

confidence: 99%

Section: Learning Temporal Features For Fermentioning

confidence: 99%

Deep Learning for Human Affect Recognition: Insights and New Developments

Rouast

Adam

Chiong

2021

IEEE Trans. Affective Comput.

151

View full text Add to dashboard Cite

show abstract

“…Face alignment is a traditional pre-processing step in many facerelated recognition tasks. We list some well-known approaches [16], [63] 3000 fps [64] 68 [55] Incremental [65] 49 [66] Deep learning cascaded CNN [67] 5 fast good/ very good [68] MTCNN [69] 5 [70], [71] and publicly available implementations that are widely used in deep FER. Given a series of training data, the first step is to detect the face and then to remove background and non-face areas.…”

Section: Face Alignmentmentioning

confidence: 99%

“…Cascaded networks: By combining the powerful perceptual vision representations learned from CNNs with the strength of LSTM for variable-length inputs and outputs, Donahue et al [204] proposed a both spatially and temporally deep model which cascades the outputs of CNNs with LSTMs for various vision tasks involving time-varying inputs and outputs. Similar to this hybrid network, many cascaded networks have been proposed for FER (e.g., [66], [108], [190], [205]).…”

Section: Rnn and C3dmentioning

confidence: 99%

Deep Facial Expression Recognition: A Survey

Deng

2022

IEEE Trans. Affective Comput.

1,028

593

View full text Add to dashboard Cite

With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. We then describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.

show abstract

“…Recently, a few research efforts have been made regarding facial dynamic feature encoding for a facial analysis [9,25,6,24].It is generally known that the dynamic features of local regions are valuable for facial trait estimation [9,6]. Usually, the motion of facial local region in facial expression is related to the motion of other facial regions [39,43].…”

Section: Introductionmentioning

confidence: 99%

Facial Dynamics Interpreter Network: What Are the Important Relations Between Local Dynamics for Facial Trait Estimation?

Kim

2018

Computer Vision – ECCV 2018

Self Cite

View full text Add to dashboard Cite

Human face analysis is an important task in computer vision. According to cognitive-psychological studies, facial dynamics could provide crucial cues for face analysis. The motion of a facial local region in facial expression is related to the motion of other facial local regions. In this paper, a novel deep learning approach, named facial dynamics interpreter network, has been proposed to interpret the important relations between local dynamics for estimating facial traits from expression sequence. The facial dynamics interpreter network is designed to be able to encode a relational importance, which is used for interpreting the relation between facial local dynamics and estimating facial traits. By comparative experiments, the effectiveness of the proposed method has been verified. The important relations between facial local dynamics are investigated by the proposed facial dynamics interpreter network in gender classification and age estimation. Moreover, experimental results show that the proposed method outperforms the state-of-the-art methods in gender classification and age estimation.

show abstract

Multi-Objective Based Spatio-Temporal Feature Representation Learning Robust to Expression Intensity Variations for Facial Expression Recognition

Cited by 189 publications

References 37 publications

Deep Learning for Human Affect Recognition: Insights and New Developments

Deep Learning for Human Affect Recognition: Insights and New Developments

Deep Facial Expression Recognition: A Survey

Facial Dynamics Interpreter Network: What Are the Important Relations Between Local Dynamics for Facial Trait Estimation?

Contact Info

Product

Resources

About