Haoqiang Fan scite author profile

Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations, and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-ofthe-art methods on single image based 3d reconstruction benchmarks; but it also shows strong performance for 3d shape completion and promising ability in making multiple plausible predictions.

show abstract

PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation

Sun²,

Huang³

et al. 2020

351

254

View full text Add to dashboard Cite

DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

Xiong

Fan³

et al. 2019

478

249

View full text Add to dashboard Cite

This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of DFANet with 8× less FLOPs and 2× faster than the existing state-of-the-art real-time semantic segmentation methods while providing comparable accuracy. Specifically, it achieves 70.3% Mean IOU on the Cityscapes test dataset with only 1.7 GFLOPs and a speed of 160 FPS on one NVIDIA Titan X card, and 71.3% Mean IOU with 3.4 GFLOPs while inferring on a higher resolution image.

show abstract

Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade

Zhou¹,

Fan²,

Cao³

et al. 2013

312

194

View full text Add to dashboard Cite

We present a new approach to localize extensive facial landmarks with a coarse-to-fine convolutional network cascade. Deep convolutional neural networks (DCNN) have been successfully utilized in facial landmark localization for two-fold advantages: 1) geometric constraints among facial points are implicitly utilized; 2) huge amount of training data can be leveraged. However, in the task of extensive facial landmark localization, a large number of facial landmarks (more than 50 points) are required to be located in a unified system, which poses great difficulty in the structure design and training process of traditional convolutional networks. In this paper, we design a four-level convolutional network cascade, which tackles the problem in a coarse-to-fine manner. In our system, each network level is trained to locally refine a subset of facial landmarks generated by previous network levels. In addition, each level predicts explicit geometric constraints (the position and rotation angles of a specific facial component) to rectify the inputs of the current network level. The combination of coarse-to-fine cascade and geometric refinement enables our system to locate extensive facial landmarks (68 points) accurately in the 300-W facial landmark localization challenge.

show abstract

Disentangled Image Matting

Cai¹,

Zhang²,

Fan³

et al. 2019

117

115

View full text Add to dashboard Cite

Most previous image matting methods require a roughlyspecificed trimap as input, and estimate fractional alpha values for all pixels that are in the unknown region of the trimap. In this paper, we argue that directly estimating the alpha matte from a coarse trimap is a major limitation of previous methods, as this practice tries to address two difficult and inherently different problems at the same time: identifying true blending pixels inside the trimap region, and estimate accurate alpha values for them. We propose AdaMatting, a new end-to-end matting framework that disentangles this problem into two sub-tasks: trimap adaptation and alpha estimation. Trimap adaptation is a pixelwise classification problem that infers the global structure of the input image by identifying definite foreground, background, and semi-transparent image regions. Alpha estimation is a regression problem that calculates the opacity value of each blended pixel. Our method separately handles these two sub-tasks within a single deep convolutional neural network (CNN). Extensive experiments show that AdaMatting has additional structure awareness and trimap fault-tolerance. Our method achieves the state-ofthe-art performance on Adobe Composition-1k dataset both qualitatively and quantitatively. It is also the current bestperforming method on the alphamatting.com online evaluation for all commonly-used metrics.

show abstract

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

Huang²,

Fan

et al. 2021

168

View full text Add to dashboard Cite

Approaching human level facial landmark localization by deep learning

Fan¹,

Zhou²

2016

Image and Vision Computing

107

View full text Add to dashboard Cite

Deep Fusion Network for Image Completion

Hong

Xiong²,

Ji³

et al. 2019

View full text Add to dashboard Cite

Figure 1: Comparison results between DFNet and previous state-of-the-art method Edge Connect [21]. In the first image of each group, white pixels represent the unknown region. With fusion blocks along with multi-scale constraints, DFNet has smoother transition (1st case), more natural texture (2nd case) and more consistent structure (3rd case). AbstractDeep image completion usually fails to harmonically blend the restored image into existing content, especially in the boundary area. This paper handles with this problem from a new perspective of creating a smooth transition and proposes a concise Deep Fusion Network (DFNet). Firstly, a fusion block is introduced to generate a flexible alpha composition map for combining known and unknown regions. The fusion block not only provides a smooth fusion between restored and existing content, but also provides an attention map to make network focus more on the unknown pixels. In this way, it builds a bridge for structural and texture information, so that information can be naturally propagated from known region into completion. * This work is done when Xin Hong is an intern at Megvii Technology.Furthermore, fusion blocks are embedded into several decoder layers of the network. Accompanied by the adjustable loss constraints on each layer, more accurate structure information are achieved. We qualitatively and quantitatively compare our method with other state-of-the-art methods on Places2 and CelebA datasets. The results show the superior performance of DFNet, especially in the aspects of harmonious texture transition, texture detail and semantic structural consistency. Our source code will be avaiable at: https://github.com/hughplay/DFNet

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haoqiang Fan

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation

DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade

Disentangled Image Matting

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

Approaching human level facial landmark localization by deep learning

Deep Fusion Network for Image Completion

Contact Info

Product

Resources

About