2019
DOI: 10.1142/s0219843619410020
|View full text |Cite
|
Sign up to set email alerts
|

CNN-Based Facial Expression Recognition from Annotated RGB-D Images for Human–Robot Interaction

Abstract: Facial expression recognition has been widely used in human computer interaction (HCI) systems. Over the years, researchers have proposed different feature descriptors, implemented different classification methods, and carried out a number of experiments on various datasets for automatic facial expression recognition. However, most of them used 2D static images or 2D video sequences for the recognition task. The main limitations of 2D-based analysis are problems associated with variations in pose and illuminat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 33 publications
(13 citation statements)
references
References 18 publications
0
11
0
Order By: Relevance
“…In the aspect of human pose estimation, someone designed OpenPose, which can detect multiple human poses in one image in real time, predict the joint point confidence and part of the affinity field vector in the multicamera scene, and match the 3D information from the 2D pose detection results, which has achieved good robustness; Purnama and Sari put forward the method of fitting dense 3D human body with skinned multiperson linear model (SMPL), constructs DensePose network to regress the human body surface, and maps the human body pixels of the image into 3D human body surface [8]; Li et al proposed a real-time and stable attitude estimation method based on monocular camera, which has high real-time detection speed. However, in some scenes, the joint prediction is not accurate enough to solve the problem of limb occlusion [9]; He et al propose to use Kinect camera and color camera to synchronously collect facial expression and body posture, use Kinect fusion to scan the character model, use robust ICP method to realize pose estimation, and reconstruct the face to obtain facial animation [10]. However, it is easy to introduce noise during Kinect acquisition, resulting in jitter.…”
Section: Literature Reviewmentioning
confidence: 99%
“…In the aspect of human pose estimation, someone designed OpenPose, which can detect multiple human poses in one image in real time, predict the joint point confidence and part of the affinity field vector in the multicamera scene, and match the 3D information from the 2D pose detection results, which has achieved good robustness; Purnama and Sari put forward the method of fitting dense 3D human body with skinned multiperson linear model (SMPL), constructs DensePose network to regress the human body surface, and maps the human body pixels of the image into 3D human body surface [8]; Li et al proposed a real-time and stable attitude estimation method based on monocular camera, which has high real-time detection speed. However, in some scenes, the joint prediction is not accurate enough to solve the problem of limb occlusion [9]; He et al propose to use Kinect camera and color camera to synchronously collect facial expression and body posture, use Kinect fusion to scan the character model, use robust ICP method to realize pose estimation, and reconstruct the face to obtain facial animation [10]. However, it is easy to introduce noise during Kinect acquisition, resulting in jitter.…”
Section: Literature Reviewmentioning
confidence: 99%
“…MCT‐based gesture sensing technology generally uses surface electromyography (EMG) electrodes to collect EMG signals and recognises gestures based on the EMG signals [25]. According to electrodes, gesture recognition based on EMG signals is divided into two categories: one is based on sparse multi‐channel EMG signals and the other is based on high‐density EMG signals [26]. In the problems of gesture recognition based on sparse multi‐channel EMG signals, gesture recognition is generally defined as the problem of signal sequence classification [27].…”
Section: Related Workmentioning
confidence: 99%
“…For the static recognition task [8]- [10], the semantic information is encoded according to a single image input, while for the dynamic recognition task [11], [12], the representation of hidden layers is related to the temporal relation among contiguous frames in the input facial expression sequence. In 2019, J Li et al [13] used Microsoft Kinect to collect the RGB-D dataset and used a two-stream network to implement a dynamic recognition task. In this paper, we will limit our discussion on FER based on the static recognition task.…”
Section: A the Facial Expression Recognition Taskmentioning
confidence: 99%