Egocentric hand pose estimation is significant for wearable cameras since the hand interactions are captured from an egocentric viewpoint. Several studies on hand pose estimation have recently been presented based on RGBD or RGB sensors. Although these methods provide accurate hand pose estimation, they have several limitations. For example, RGB-based techniques have intrinsic difficulty in converting relative 3D poses into absolute 3D poses, and RGBD-based techniques only work in indoor environments. Recently, stereo-sensor-based techniques have gained increasing attention owing to their potential to overcome these limitations. However, to the best of our knowledge, there are few techniques and no real datasets available for egocentric stereo vision. In this paper, we propose a top-down pipeline for estimating absolute 3D hand poses using stereo sensors, as well as a novel dataset for training. Our top-down pipeline consists of two steps: hand detection and hand pose estimation. Hand detection detects hand areas and then is followed by hand pose estimation, which estimates the positions of the hand joints. In particular, for hand pose estimation with a stereo camera, we propose an attention-based architecture called StereoNet, a geometry-based loss function called StereoLoss, and a novel 2D disparity map called StereoDMap for effective stereo feature learning. To collect the dataset, we proposed a novel annotation method that helps reduce human annotation efforts. Our dataset is publicly available at https://github.com/seo0914/SEH. We conducted comprehensive experiments to demonstrate the effectiveness of our approach compared with the state-of-the-art methods.INDEX TERMS Hand pose estimation, stereo vision, wearable sensors, egocentric view.
Data augmentation is a well-known technique used for improving the generalization performance of modern neural networks. After the success of several traditional random data augmentation for images (including flipping, translation, or rotation), a recent surge of interest in implicit data augmentation techniques occurs to complement random data augmentation techniques. Implicit data augmentation augments training samples in feature space, rather than in pixel space, resulting in the generation of semantically meaningful data. Several techniques on implicit data augmentation have been introduced for classification tasks. However, few approaches have been introduced for regression tasks with continuous/structured labels, such as object pose estimation. Hence, we are motivated to propose a method for implicit semantic data augmentation for hand pose estimation. By considering semantic distances of hand poses, the proposed method implicitly generates extra training samples in feature space. We propose two additional techniques to improve the performance of this augmentation: metric learning and hand-dependent augmentation. Metric learning aims to learn feature representations to reflect the semantic distance of data. For hand pose estimation, the distribution of augmented hand poses can be regulated by managing the distribution of feature representations. Meanwhile, hand-dependent augmentation is specifically designed for hand pose estimation to prevent semantically meaningless hand poses from being generated (e.g., hands generated by simple interpolation between both hands). Further, we demonstrate the effectiveness of the proposed technique using two well-known hand pose datasets: STB and RHD.
We propose AirPincher, a handheld device for recognizing delicate mid-air hand gestures. AirPincher is designed to overcome disadvantages of the two kinds of existing hand gesture-aware techniques such as wearable sensor-based and external vision-based. The wearable sensor-based techniques cause cumbersomeness of wearing sensors every time and the external vision-based techniques incur performance dependence on distance between a user and a remote display. AirPincher allows a user to hold the device in one hand and to generate several delicate mid-air finger gestures. The gestures are captured by several sensors proximately embedded into AirPincher. These features help AirPincher avoid the aforementioned disadvantages of the existing techniques. It allows several delicate finger gestures, for example, rubbing a thumb against a middle finger, swiping with a thumb on an index finger, pinching with a thumb and an index finger, etc. Due to the inherent haptic feedback of these gestures, AirPincher eventually supports the eyes-free interaction. To validate AirPincher's feasibility, we implemented two use cases, i.e., controlling a pointing cursor and moving a virtual 3D object on the remote screen.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.