Precise 3D hand pose estimation can be used to improve the performance of human–computer interaction (HCI). Specifically, computer-vision-based hand pose estimation can make this process more natural. Most traditional computer-vision-based hand pose estimation methods use depth images as the input, which requires complicated and expensive acquisition equipment. Estimation through a single RGB image is more convenient and less expensive. Previous methods based on RGB images utilize only 2D keypoint score maps to recover 3D hand poses but ignore the hand texture features and the underlying spatial information in the RGB image, which leads to a relatively low accuracy. To address this issue, we propose a channel fusion attention mechanism that combines 2D keypoint features and RGB image features at the channel level. In particular, the proposed method replans weights by using cascading RGB images and 2D keypoint features, which enables rational planning and the utilization of various features. Moreover, our method improves the fusion performance of different types of feature maps. Multiple contrast experiments on public datasets demonstrate that the accuracy of our proposed method is comparable to the state-of-the-art accuracy.
Gesture recognition is an important way of human-computer interaction. With time going on, people are no longer satisfied with gesture recognition based on wearable devices, but hope to perform gesture recognition in a more natural way. Computer vision-based gesture recognition can transfer human feelings and instructions to computers conveniently and efficiently, and improve the efficiency of human-computer interaction significantly. The gesture recognition based on computer vision is mainly based on hidden Markov, dynamic time rounding algorithm and neural network algorithm. The process is roughly divided into three steps: image collection, hand segmentation, gesture recognition and classification. This paper reviews the computer vision-based gesture recognition methods in the past 20 years, analyses the research status at home and abroad, summarizes its current development, the advantages and disadvantages of different gesture recognition methods, and looks forward to the development trend of gesture recognition technology in the next stage.
Hand pose estimation from RGB images has always been a difficult task, owing to the incompleteness of the depth information. Moon et al. improved the accuracy of hand pose estimation by using a new network, InterNet, through their unique design. Still, the network still has potential for improvement. Based on the architecture of MobileNet v3 and MoGA, we redesigned a feature extractor that introduced the latest achievements in the field of computer vision, such as the ACON activation function and the new attention mechanism module, etc. Using these modules effectively with our network, architecture can better extract global features from an RGB image of the hand, leading to a greater performance improvement compared to InterNet and other similar networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.