Zehao Xue scite author profile

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.

show abstract

An image-text consistency driven multimodal sentiment analysis approach for social media

Zhao

Zhu

Xue

et al. 2019

Information Processing & Management

115

View full text Add to dashboard Cite

3D Hand Shape and Pose Estimation from a Single RGB Image

Ge¹,

Ren²,

Li³

et al. 2019

Preprint

View full text Add to dashboard Cite

Joint Audio-Video Driven Facial Animation

Chen

Cao

Xue

et al. 2018

View full text Add to dashboard Cite

Monocular vision and calculation of regular three-dimensional target pose based on Otsu and Haar-feature AdaBoost classifier

Wang

Wei-liang

et al. 2020

View full text Add to dashboard Cite

Using machine vision to identify and sort scattered regular targets is an urgent problem to be solved in automated production lines. This study proposed a three-dimensional (3D) recognition method combining monocular vision and machine learning algorithms. According to the color characteristics of the targets, to convert the original color picture into YC b C r mode and use the 2D Otsu algorithm to perform gray level image segmentation on the C b channel. Then the Haar-feature training was carried out. The comparison of feature training and Haar method for Hough transform showed that the recognized time of Haar-feature AdaBoost trainer reached 31.00 ms, while its false recognized rate was 3.91%. The strong classifier was formed by weight combination, and the Hough contour transformation algorithm was set to correct the normal vector between plane coordinate and camera coordinate system. The monocular vision system ensured that the field of camera view had not obstructed while the dots were being struck. It was measured and calculated angles between targets and the horizontal plane which coordinate points of the identified plane feature. The testing results were compared with the Otsu and AdaBoost trainer where the prediction and training set have an error of no more than 0.25 mm. Its correct rate can reach 95%. It shows that the Otsu and Haar-feature based on AdaBoost algorithm is feasible within a certain error ranges and meet the engineering requirements for solving the poses of automated regular three-dimensional targets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zehao Xue

3D Hand Shape and Pose Estimation From a Single RGB Image

An image-text consistency driven multimodal sentiment analysis approach for social media

3D Hand Shape and Pose Estimation from a Single RGB Image

Joint Audio-Video Driven Facial Animation

Monocular vision and calculation of regular three-dimensional target pose based on Otsu and Haar-feature AdaBoost classifier

Contact Info

Product

Resources

About