2020
DOI: 10.48550/arxiv.2002.04831
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

End-to-End Face Parsing via Interlinked Convolutional Neural Networks

Abstract: Face parsing is an important computer vision task that requires accurate pixel segmentation of facial parts (such as eyes, nose, mouth, etc.), providing a basis for further face analysis, modification, and other applications. In this paper, we introduce a simple, end-to-end face parsing framework: STN-aided iCNN (STN-iCNN), which extends interlinked Convolutional Neural Network (iCNN) by adding a Spatial Transformer Network (STN) between the two isolated stages. The STN-iCNN uses the STN to provide a trainable… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 33 publications
0
5
0
Order By: Relevance
“…Region-based parsing takes the scale discrepancy into account and predicts each component respectively, which is advantageous in capturing elaborate details [12], [13], [14], [15], [16]. Zhou et al present an interlinked CNN that takes multi-scale images as input and allows bidirectional information passing [15].…”
Section: Related Work a Face Parsingmentioning
confidence: 99%
See 2 more Smart Citations
“…Region-based parsing takes the scale discrepancy into account and predicts each component respectively, which is advantageous in capturing elaborate details [12], [13], [14], [15], [16]. Zhou et al present an interlinked CNN that takes multi-scale images as input and allows bidirectional information passing [15].…”
Section: Related Work a Face Parsingmentioning
confidence: 99%
“…This method demonstrates high performance especially for hair segmentation [13]. Yin et al introduce the Spatial Transformer Network and build a training connection between traditional interlinked CNNs, which makes the end-to-end joint training process possible [14]. Nevertheless, this class of methods often neglect the correlation among components to characterize long range dependencies.…”
Section: Related Work a Face Parsingmentioning
confidence: 99%
See 1 more Smart Citation
“…We conduct experiments on the broadly acknowledged Helen dataset to demonstrate the superiority of the proposed model. To keep consistent with the previous works [6,4,38,5,33], we employ the overall F1 score to measure the performance, which is computed by combining the merged eyes, brows, nose and mouth categories. As Table 3 shows, Our model surpasses state-of-the-art methods and achieves 93.2% on this dataset.…”
Section: Comparison With the State-of-the-artmentioning
confidence: 99%
“…The region-based methods have been recently proposed to model the facial components separately [4,5,6], and achieved state-of-the-art performance on the current benchmarks. However, these methods are based on the individual information within each region, and the correlation among regions is not exploited yet to capture long range dependencies.…”
Section: Introductionmentioning
confidence: 99%