In this paper, we propose a novel adaptive deep disturbance-disentangled learning (ADDL) method for effective facial expression recognition (FER). ADDL involves a two-stage learning procedure. First, a disturbance feature extraction model (DFEM) is trained to identify multiple disturbing factors on a large-scale face database involving disturbance label information. Second, an adaptive disturbance-disentangled model (ADDM), which contains a global shared subnetwork and two task-specific subnetworks, is designed and learned to explicitly disentangle disturbing factors from facial expression images. In particular, the expression subnetwork leverages a multi-level attention mechanism to extract expression-specific features, while the disturbance subnetwork embraces a new adaptive
One of the main challenges in facial expression recognition (FER) is to address the disturbance caused by various disturbing factors, including common ones (such as identity, pose, and illumination) and potential ones (such as hairstyle, accessory, and occlusion). Recently, a number of FER methods have been developed to explicitly or implicitly alleviate the disturbance involved in facial images. However, these methods either consider only a few common disturbing factors or neglect the prior information of these disturbing factors, thus resulting in inferior recognition performance. In this paper, we propose a novel Dual-branch Disturbance Disentangling Network (D 3 Net), mainly consisting of an expression branch and a disturbance branch, to perform effective FER. In the disturbance branch, a label-aware sub-branch (LAS) and a label-free sub-branch (LFS) are elaborately designed to cope with different types of disturbing factors. On the one hand, LAS explicitly captures the disturbance due to some common disturbing factors by transfer learning on a pretrained model. On the other hand, LFS implicitly encodes the information of potential disturbing factors in an unsupervised manner. In particular, we introduce an Indian buffet process (IBP) prior to model the distribution of potential disturbing factors in LFS. Moreover, we leverage adversarial training to increase the differences between disturbance features and expression features, thereby enhancing the disentanglement of disturbing factors. By disentangling the disturbance from facial images, we are able to extract discriminative expression features. Extensive experiments demonstrate that our proposed method performs favorably
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.