To facilitate referential communication between humans and robots and mediate their differences in representing the shared environment, we are exploring embodied collaborative models for referring expression generation (REG). Instead of a single minimum description to describe a target object, episodes of expressions are generated based on human feedback during human-robot interaction. We particularly investigate the role of embodiment such as robot gesture behaviors (i.e., pointing to an object) and human's gaze feedback (i.e., looking at a particular object) in the collaborative process. This paper examines different strategies of incorporating embodiment and collaboration in REG and discusses their possibilities and challenges in enabling human-robot referential communication.
Linguistics studies have shown that action verbs often denote some Change of State (CoS) as the result of an action. However, the causality of action verbs and its potential connection with the physical world has not been systematically explored. To address this limitation, this paper presents a study on physical causality of action verbs and their implied changes in the physical world. We first conducted a crowdsourcing experiment and identified eighteen categories of physical causality for action verbs. For a subset of these categories, we then defined a set of detectors that detect the corresponding change from visual perception of the physical environment. We further incorporated physical causality modeling and state detection in grounded language understanding. Our empirical studies have demonstrated the effectiveness of causality modeling in grounding language to perception.
This study presents a learning-by-imitation technique that learns social robot interaction behaviors from natural humanhuman interaction data and requires minimum input from a designer. To solve the problem of responding to ambiguous human actions, a novel topic clustering algorithm based on action cooccurrence frequencies is introduced. The system learns humanreadable rules that dictate which action the robot should take, based on the most recent human action and the current estimated topic of conversation. The technique is demonstrated in a scenario where the robot learns to play the role of a travel agent. The proposed technique outperformed several baseline techniques in qualitative and quantitative evaluations. It responded more accurately to ambiguous questions and participants found it was easier to understand, provided more information, and required less effort to interact with.
In situated dialogue with artificial agents (e.g., robots), although a human and an agent are co-present, the agent's representation and the human's representation of the shared environment are significantly mismatched. Because of this misalignment, our previous work has shown that when the agent applies traditional approaches to generate referring expressions for describing target objects with minimum descriptions, the intended objects often cannot be correctly identified by the human. To address this problem, motivated by collaborative behaviors in human referential communication, we have developed two collaborative models - an episodic model and an installment model - for referring expression generation. Both models, instead of generating a single referring expression to describe a target object as in the previous work, generate multiple small expressions that lead to the target object with the goal of minimizing the collaborative effort. In particular, our installment model incorporates human feedback in a reinforcement learning framework to learn the optimal generation strategies. Our empirical results have shown that the episodic model and the installment model outperform previous non-collaborative models with an absolute gain of 6% and 21% respectively.
We envision a future where service robots autonomously learn how to interact with humans directly from human-human interaction data, without any manual intervention. In this paper, we present a data-driven pipeline that: (1) takes in low-level data of a human shopkeeper interacting with multiple customers (28 hours of collected data); (2) autonomously extracts high-level actions from that data; and (3) learns-without manual intervention-how a robotic shopkeeper should respond to customers' actions online. Our proposed system for learning the interaction logic uses neural networks to first learn which customer actions are important to respond to and then learn how the shopkeeper should respond to those important customer actions. We present a novel technique for learning which customer actions are important by first learning the hidden causal relationship between customer and shopkeeper actions. In an offline evaluation, we show that our proposed technique significantly outperforms state-of-the-art baselines, in both which customer actions are important and how to respond to them. CCS CONCEPTS • Computing methodologies → Learning from demonstrations; • Human-centered computing → HCI theory, concepts and models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.