“…The folding step is further guided by multiscale features, representing both global and local information. Yang et al (2022) presented a shallow deep neural network that incorporates specific layers capable of iteratively refining the predicted hand pose. Hand pose estimation has expanded beyond the use of depth maps and RGB signals.…”
Section: State-of-the-art Papersmentioning
confidence: 99%
“…They proposed EventHands, an approach which regresses 3D hand poses exploiting locally-normalised event surfaces, which is a new way of accumulating events over temporal windows. Of these works, only Yang et al (2022) evaluated their method on egocentric hand pose, though the method was tested for general views.…”
Section: State-of-the-art Papersmentioning
confidence: 99%
“…For the future Even though there has been advancement across various research areas related to hand analysis, current approaches still come with their own set of limitations. State-of-the-art works in hand pose focus on the analysis of the posture considering different signals (Rudnev et al, 2021;Yang et al, 2022) or challenging scenarios in which both hands interact simultaneously (Lee et al, 2023). At this stage, methods are able to predict hand pose in a large variety of domains thanks to the availability of large datasets acquired and labelled explicitly for these domains.…”
What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.
“…The folding step is further guided by multiscale features, representing both global and local information. Yang et al (2022) presented a shallow deep neural network that incorporates specific layers capable of iteratively refining the predicted hand pose. Hand pose estimation has expanded beyond the use of depth maps and RGB signals.…”
Section: State-of-the-art Papersmentioning
confidence: 99%
“…They proposed EventHands, an approach which regresses 3D hand poses exploiting locally-normalised event surfaces, which is a new way of accumulating events over temporal windows. Of these works, only Yang et al (2022) evaluated their method on egocentric hand pose, though the method was tested for general views.…”
Section: State-of-the-art Papersmentioning
confidence: 99%
“…For the future Even though there has been advancement across various research areas related to hand analysis, current approaches still come with their own set of limitations. State-of-the-art works in hand pose focus on the analysis of the posture considering different signals (Rudnev et al, 2021;Yang et al, 2022) or challenging scenarios in which both hands interact simultaneously (Lee et al, 2023). At this stage, methods are able to predict hand pose in a large variety of domains thanks to the availability of large datasets acquired and labelled explicitly for these domains.…”
What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.
“…Most previous works tackle 3D hand pose estimation [17,25,40,50,47] and object pose estimation [27,31,44,49] separately. Recently joint hand-object pose estimation has received more focus [14,26,28,12,8,13,11] due to the strong correlation when hands interact with objects.…”
Section: Hand-object Pose Estimationmentioning
confidence: 99%
“…extended reality (XR) [38] and human-computer iteration (HCI) [24]. Despite that great efforts have been contributed to developing effective 3D hand pose estimation algorithms [17,25,40,50,47], joint hand-object pose estimation remains especially challenging due to the severe mutual occlusion and diverse ways of hand-object manipulation. Methods failing to tackle the aforementioned challenges tend to produce physically implausible configurations, such as interpenetration and out-of-contact.…”
3D hand-object pose estimation is the key to the success of many computer vision applications. The main focus of this task is to effectively model the interaction between the hand and an object. To this end, existing works either rely on interaction constraints in a computationally-expensive iterative optimization, or consider only a sparse correlation between sampled hand and object keypoints. In contrast, we propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object. Specifically, we first construct the hand and object graphs according to their mesh structures. For each hand node, we aggregate features from every object node by the learned attention and vice versa for each object node. Thanks to such dense mutual attention, our method is able to produce physically plausible poses with high quality and real-time inference speed. Extensive quantitative and qualitative experiments on large benchmark datasets show that our method outperforms state-of-the-art methods. The code is available at https://github.com/ rongakowang/DenseMutualAttention.git.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.