Research of head pose estimation in computer vision has been at the center of much attention. This work presents a framework based on adaptive graph convolution network (AGCN) to process both 2D and 3D facial landmarks extracted from the input RGB image. The network has a two-streams (teacher/3D-student/2D streams) architecture, trained with a 3D to 2D knowledge distillation training process, to transfer features of the 3D stream to the 2D stream for performance promotion. Several processing modules, such as depth-denoising for detected 3D landmarks, multi-stream fusion in inference, were also proposed for further increase of the prediction performance and robustness of our proposed method. In experiments, we follow standard protocols (in terms of datasets and metrices) to evaluate our performance. Three datasets 300W-LP, AFLW2000 and BIWI were used. The performance is measured in mean absolute error (MAE). We can achieve better performance compared to most of the state-of-theart methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.