FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos

Sun, Kang; Wu, Wayne; Liu, Tinghao; Yang, Shuo; Wang, Quan; Zhou, Qiang; Ye, Zuochang; Qian, Chen

doi:10.1109/iccv.2019.00556

Cited by 30 publications

(26 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed method is compared to other state-of-the-art landmark localization methods. From these approaches, coordinate regression methods include SDM [35], TSCN [26], IFA [1], CFSS [40], TCDCN [38], TSTN [14], DSRN [17], ODN [39], STA [30], Sun et al's work [27] and GAN [36]. Heatmap regression methods include Newell et al's work [18], SAN [9], LAB [32], CNN-CRF [5], LaplaceKL [23], Sun et al's work [28], DSNT [19] , DARK [37], FHR [30], GHCU [15] [10,11,21] for landmark detection are trained under different conditions with our method so are not included in our comparison.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking

Yin

Wang

Chen

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Although heatmap regression is considered a state-of-the-art method to locate facial landmarks, it suffers from huge spatial complexity and is prone to quantization error. To address this, we propose a novel attentive one-dimensional heatmap regression method for facial landmark localization. First, we predict two groups of 1D heatmaps to represent the marginal distributions of the and coordinates. These 1D heatmaps reduce spatial complexity significantly compared to current heatmap regression methods, which use 2D heatmaps to represent the joint distributions of and coordinates. With much lower spatial complexity, the proposed method can output high-resolution 1D heatmaps despite limited GPU memory, significantly alleviating the quantization error. Second, a co-attention mechanism is adopted to model the inherent spatial patterns existing in and coordinates, and therefore the joint distributions on the and axes are also captured. Third, based on the 1D heatmap structures, we propose a facial landmark detector capturing spatial patterns for landmark detection on an image; and a tracker further capturing temporal patterns with a temporal refinement mechanism for landmark tracking. Experimental results on four benchmark databases demonstrate the superiority of our method. CCS CONCEPTS • Computing methodologies → Biometrics.

show abstract

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking

Yin

Wang

Chen

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…High temporal resolution: ECs can provide up to MHz sampling resolution with high speed input motion, and with proportionally low latency. The stream of events from ECs does not suffer from motion blur [ 54 ], which is often observed in images of fast head rotations, on the mouth during speech, or due to camera motion, avoiding the need to implement costly de-blurring in face alignment [ 12 ]. ECs are therefore suitable in applications where motion provide the most relevant information, as in facial action recognition, voice activity detection and visual speech recognition, that must be robust to face pose variations.…”

Section: Discussionmentioning

confidence: 99%

“…At first, the detection of pose change activates the alignment, preventing unnecessary processing when the head doesn’t move. Alignment is then performed by regression cascade of tree ensembles, exploiting their superior computational efficiency [ 9 , 10 ] with respect to (possibly more accurate) state-of-the-art alignment methods based on deep neural networks (DNNs) [ 5 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 ]. In a scenario where energy efficiency is a matter of the utmost importance, a DNN-based alignment pre-processor might eclipse the energy efficiency advantage of ECs.…”

Section: Introductionmentioning

confidence: 99%

Face Pose Alignment with Event Cameras

Savran

Bartolozzi

2020

Sensors

View full text Add to dashboard Cite

Event camera (EC) emerges as a bio-inspired sensor which can be an alternative or complementary vision modality with the benefits of energy efficiency, high dynamic range, and high temporal resolution coupled with activity dependent sparse sensing. In this study we investigate with ECs the problem of face pose alignment, which is an essential pre-processing stage for facial processing pipelines. EC-based alignment can unlock all these benefits in facial applications, especially where motion and dynamics carry the most relevant information due to the temporal change event sensing. We specifically aim at efficient processing by developing a coarse alignment method to handle large pose variations in facial applications. For this purpose, we have prepared by multiple human annotations a dataset of extreme head rotations with varying motion intensity. We propose a motion detection based alignment approach in order to generate activity dependent pose-events that prevents unnecessary computations in the absence of pose change. The alignment is realized by cascaded regression of extremely randomized trees. Since EC sensors perform temporal differentiation, we characterize the performance of the alignment in terms of different levels of head movement speeds and face localization uncertainty ranges as well as face resolution and predictor complexity. Our method obtained 2.7% alignment failure on average, whereas annotator disagreement was 1%. The promising coarse alignment performance on EC sensor data together with a comprehensive analysis demonstrate the potential of ECs in facial applications.

show abstract

“…This is because FHR does not consider inter-frame temporal dependency, so it has difficulties addressing the problem of heavy occlusions in motion. FAB [40] aims to handle motion-blurred videos by utilizing eight residual blocks to build an hourglass network for predicting boundary maps. Additionally, two convolutional layers and four residual blocks are used to generate a de-blurred sharp image, which form a pre-activated Resnet-18 as FAB's replaceable facial landmark detection network for landmark detection.…”

Section: Related Workmentioning

confidence: 99%

Reasoning Structural Relation for Occlusion-Robust Facial Landmark Localization

Zhu¹,

Li²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

In facial landmark localization tasks, various occlusions heavily degrade the localization accuracy due to the partial observability of facial features. This paper proposes a structural relation network (SRN) for occlusion-robust landmark localization. Unlike most existing methods that simply exploit the shape constraint, the proposed SRN aims to capture the structural relations among different facial components. These relations can be considered a more powerful shape constraint against occlusion. To achieve this, a hierarchical structural relation module (HSRM) is designed to hierarchically reason the structural relations that represent both long-and short-distance spatial dependencies. Compared with existing network architectures,the HSRM can efficiently model the spatial relations by leveraging its geometry-aware network architecture, which reduces the semantic ambiguity caused by occlusion. Moreover, the SRN augments the training data by synthesizing occluded faces. To further extend our SRN for occluded video data, we formulate the occluded face synthesis as a Markov decision process (MDP).

show abstract

FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos

Cited by 30 publications

References 41 publications

Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking

Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and Tracking

Face Pose Alignment with Event Cameras

Reasoning Structural Relation for Occlusion-Robust Facial Landmark Localization

Contact Info

Product

Resources

About