Inpaint2Learn: A Self-Supervised Framework for Affordance Learning

Zhang, Lingzhi; Du, Wenli; Zhou, Shenghao; Wang, Jiancong; Shi, Jianbo

doi:10.1109/wacv51458.2022.00383

Cited by 4 publications

(2 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous research has focused on predicting affordances using computer vision [17] [18]. However, good quality datasets are sparse, which some groups like Zhang et al [19] try to address, and observational information can only used for associations in contrast to causal learning enabled by interventions [11] and neglects the central role of embodiment for robots and cognitive systems [20]. In earlier work, we demonstrated the usefulness of interventions for learning causal dependencies between actions, in order to make more profound sense of human demonstrations in a shared environment [21].…”

Section: Related Workmentioning

confidence: 99%

Learning Causal Relationships of Object Properties and Affordances Through Human Demonstrations and Self-Supervised Intervention for Purposeful Action in Transfer Environments

Uhde

Berberich

et al. 2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Learning object affordances enables robots to plan and perform purposeful actions. However, a fundamental challenge for the utilization of affordance knowledge lies in its generalization to unknown objects and environments. In this paper we present a new method for learning causal relationships between object properties and object affordances which can be transferred to other environments. Our approach, implemented on a PR2 robot, generates hypotheses of property-affordance models in a toy environment based on human demonstrations that are subsequently tested through interventional experiments.The system relies on information theory to choose experiments for maximal information gain, performs them self-supervised and uses the observed outcome to iteratively refine the set of candidate causal models. The learned causal knowledge is humaninterpretable in the form of graphical models, stored in the knowledge graph. We validate our method through a task requiring affordance knowledge transfer to three different unknown environments. Our results show that extending learning from human demonstrations by causal learning through interventions led to a 71.7% decrease in model uncertainty and improved affordance classification in the transfer environments on average by 47.49%.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Causal Relationships of Object Properties and Affordances Through Human Demonstrations and Self-Supervised Intervention for Purposeful Action in Transfer Environments

Uhde

Berberich

et al. 2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

show abstract

“…Despite their importance and potential benefits, identifying the locations of physical contacts and their moments in 3D environments remains a challenging problem that requires complex contextual data and advanced processing method such as wearable sensors and computer vision algorithms. Unsurprisingly, various methods for body-contact analysis have been proposed and evaluated in the fields of human activity recognition (HAR) and human-scene interaction (HSI) [2]- [7]. These methods often consider the physical affordances of a target object and 3D data to identify the interaction between the actor's body part and the object while utilizing or developing novel sensing and processing approaches, such as depth cameras, infrared (IR) cameras, inertial measurement units (IMUs), and light detection and ranging (LiDAR) sensors.…”

Section: Introductionmentioning

confidence: 99%

Contact Part Detection From 3D Human Motion Data Using Manually Labeled Contact Data and Deep Learning

Kang,

Kim,

Kim

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Research on the interaction between users and their environment has been conducted in various fields, including human activity recognition (HAR), human-scene interaction (HSI), computer graphics (CG), and virtual reality (VR). Typically, the interaction process commences with a human body part's movement and involves contact with a target object or the environment. The choice of the body part to make contact depends on the interaction's purpose and affordance, making contact a fundamental aspect of interaction. However, detecting the specific body parts in contact, especially in the context of 3D motion and complex environments, poses computational challenges. To address this challenge, this study proposes a method for contact detection using motion data. The motion data utilized in this study are limited to actions feasible in an office environment. Since contact states of different body parts are independent, the proposed method comprises two distinct models: a feature model generating common features for each body part and a part model recognizing the contact state of each body part. The feature model employs a bidirectional long-short term memory(Bi-LSTM) structure to capture the sequential nature of motion data, ensuring the incorporation of continuous data characteristics. In contrast, the part model employs separate weights optimized for each body part within the deep neural network. Experimental results demonstrate the proposed method's high accuracy, recall, and precision, with values of 0.99, 0.97, and 0.95, respectively.

show abstract