Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications

Zhang, Lingzhi; Zhou, Shenghao; Stent, Simon; Shi, Jianbo

doi:10.1007/978-3-031-19818-2_8

Cited by 13 publications

(3 citation statements)

References 73 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although the analysis of hand-object interactions mostly involves bounding box annotations, a few works have focused on studying hand-object relations using semantic segmentation mask annotations (González-Sosa et al, 2021;Zhang et al, 2022a;Darkhalil et al, 2022;Tokmakov et al, 2023). These works focus on hands and active objects semantic seg-mentation considering egocentric images (González-Sosa et al, 2021;Zhang et al, 2022a) or videos (Darkhalil et al, 2022;Tokmakov et al, 2023). Darkhalil et al (2022) defined and predicted hand-object relations, including cases where the on-hand glove is in contact with an object in the environment.…”

Section: State-of-the-art Papersmentioning

confidence: 99%

“…Due to the massive-scale and unconstrained nature of Ego4D, it has proved to be useful for various tasks including action recognition (Liu et al, 2022a;Lange et al, 2023), action detection (Wang et al, 2023a), visual question answering (Bärmann & Waibel, 2022), active speaker detection (Wang et al, 2023d), natural language localisation , natural language queries (Ramakrishnan et al, 2023), gaze estimation (Lai et al, 2022), persuasion modelling for conversational agents (Lai et al, 2023b), audio visual object localisation (Huang et al, 2023a), hand-object segmentation (Zhang et al, 2022b) and action anticipation (Ragusa et al, 2023a;Pasca et al, 2023;Mascaró et al, 2023). New tasks have also been introduced thanks to the diversity of Ego4D, e.g.…”

Section: General Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

An Outlook into the Future of Egocentric Vision

Plizzari,

Goletto,

Furnari

et al. 2024

Int J Comput Vis

View full text Add to dashboard Cite

What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.

show abstract

Section: State-of-the-art Papersmentioning

confidence: 99%

Section: General Datasetsmentioning

confidence: 99%

An Outlook into the Future of Egocentric Vision

Plizzari,

Goletto,

Furnari

et al. 2024

Int J Comput Vis

View full text Add to dashboard Cite

show abstract

“…Hand-object grasp reconstruction also employs contact to refine the hand and object pose estimation [5,15,20,52,54]. In addition, some works [36,47,62] detect hands and classify their physical contact state into self-contact, person-person contact, and person-object contact. Although they consider the relationship between hands and other objects in the scene, they detect only a rough bounding box or boundary for the hand, instead of a finer-grained contact area.…”

Section: Related Workmentioning

confidence: 99%

Detecting Human-Object Contact in Images

Chen¹,

Dwivedi²,

Black³

et al. 2023

Preprint

View full text Add to dashboard Cite

Humans constantly contact objects to move and perform tasks. Thus, detecting human-object contact is important for building human-centered artificial intelligence. However, there exists no robust method to detect contact between the body and the scene from an image, and there exists no dataset to learn such a detector. We fill this gap with HOT ("Human-Object conTact"), a new dataset of human-object contacts in images. To build HOT, we use two data sources:(1) We use the PROX dataset of 3D human meshes moving in 3D scenes, and automatically annotate 2D image areas for contact via 3D mesh proximity and projection. (2) We use the V-COCO, HAKE and Watch-n-Patch datasets, and ask trained annotators to draw polygons around the 2D image areas where contact takes place. We also annotate the involved body part of the human body. We use our HOT dataset to train a new contact detector, which takes a single color image as input, and outputs 2D contact heatmaps as well as the body-part labels that are in contact. This is a new and challenging task, that extends current foot-ground or hand-object contact detectors to the full generality of the whole body. The detector uses a part-attention branch to guide contact estimation through the context of the surrounding body parts and scene. We evaluate our detector extensively, and quantitative results show that our model outperforms baselines, and that all components contribute to better performance. Results on images from an online repository show reasonable detections and generalizability. Our HOT data and model are available for research at https://hot.is.tue.mpg.de.

show abstract