Our goal is to develop models that allow a robot to understand natural language instructions in the context of its world representation. Contemporary models learn possible correspondences between parsed instructions and candidate groundings that include objects, regions and motion constraints. However, these models cannot reason about abstract concepts expressed in an instruction like, "pick up the middle block in the row of five blocks". In this work, we introduce a probabilistic model that incorporates an expressive space of abstract spatial concepts as well as notions of cardinality and ordinality. The graph is structured according to the parse structure of language and introduces a factorisation over abstract concepts correlated with concrete constituents. Inference in the model is posed as an approximate search procedure that leverages partitioning of the joint in terms of concrete and abstract factors. The algorithm first estimates a set of probable concrete constituents that constrains the search procedure to a reduced space of abstract concepts, pruning away improbable portions of the exponentiallylarge search space. Empirical evaluation demonstrates accurate grounding of abstract concepts embedded in complex natural language instructions commanding a robot manipulator. The proposed inference method leads to significant efficiency gains compared to the baseline, with minimal trade-off in accuracy.
Our goal is to develop models that allow a robot to efficiently understand or "ground" natural language instructions in the context of its world representation. Contemporary approaches estimate correspondences between language instructions and possible groundings such as objects, regions, and goals for actions that the robot should execute. However, these approaches typically reason in relatively small domains and do not model abstract spatial concepts such as as "rows, " "columns, " or "groups" of objects and, hence, are unable to interpret an instruction such as "pick up the middle block in the row of five blocks. " In this paper, we introduce two new models for efficient natural language understanding of robot instructions. The first model, which we call the adaptive distributed correspondence graph (ADCG), is a probabilistic model for interpreting abstract concepts that require hierarchical reasoning over constituent concrete entities as well as notions of cardinality and ordinality. Abstract grounding variables form a Markov boundary over concrete groundings, effectively de-correlating them from the remaining variables in the graph. This structure reduces the complexity of model training and inference. Inference in the model is posed as an approximate search procedure that orders factor computation such that the estimated probable concrete groundings focus the search for abstract concepts towards likely hypothesis, pruning away improbable portions of the exponentially large space of abstractions. Further, we address the issue of scalability to complex domains and introduce a hierarchical extension to a second model termed the hierarchical adaptive distributed correspondence graph (HADCG). The model utilizes the abstractions in the ADCG but infers a coarse symbolic structure from the utterance and the environment model and then performs fine-grained inference over the reduced graphical model, further improving the efficiency of inference. Empirical evaluation demonstrates accurate grounding of abstract concepts embedded in complex natural language instructions commanding a robotic torso and a mobile robot. Further, the proposed approximate inference method allows significant efficiency gains compared with the baseline, with minimal trade-off in accuracy.
The goal of this article is to enable robots to perform robust task execution following human instructions in partially observable environments. A robot’s ability to interpret and execute commands is fundamentally tied to its semantic world knowledge. Commonly, robots use exteroceptive sensors, such as cameras or LiDAR, to detect entities in the workspace and infer their visual properties and spatial relationships. However, semantic world properties are often visually imperceptible. We posit the use of non-exteroceptive modalities including physical proprioception, factual descriptions, and domain knowledge as mechanisms for inferring semantic properties of objects. We introduce a probabilistic model that fuses linguistic knowledge with visual and haptic observations into a cumulative belief over latent world attributes to infer the meaning of instructions and execute the instructed tasks in a manner robust to erroneous, noisy, or contradictory evidence. In addition, we provide a method that allows the robot to communicate knowledge dissonance back to the human as a means of correcting errors in the operator’s world model. Finally, we propose an efficient framework that anticipates possible linguistic interactions and infers the associated groundings for the current world state, thereby bootstrapping both language understanding and generation. We present experiments on manipulators for tasks that require inference over partially observed semantic properties, and evaluate our framework’s ability to exploit expressed information and knowledge bases to facilitate convergence, and generate statements to correct declared facts that were observed to be inconsistent with the robot’s estimate of object properties.
We propose a generalizable natural language interface that allows users to provide corrective instructions to an assistive robotic manipulator in real-time. This work is motivated by the desire to improve collaboration between humans and robots in a home environment. Allowing human operators to modify properties of how their robotic counterpart achieves a goal on-the-fly increases the utility of the system by incorporating the strengths of the human partner (e.g. visual acuity and environmental knowledge). This work is applicable to users with and without disability. Our natural language interface is based on the distributed correspondence graph, a probabilistic graphical model that assigns semantic meaning to user utterances in the context of the robot’s environment and current behavior. We then use the desired corrections to alter the behavior of the robotic manipulator by treating the modifications as constraints on the motion generation (planning) paradigm. In this paper, we highlight four dimensions along which a user may wish to correct the behavior of his or her assistive manipulator. We develop our language model using data collected from Amazon Mechanical Turk to capture a comprehensive sample of terminology that people use to describe desired corrections. We then develop an end-to-end system using open-source speech-to-text software and a Kinova Robotics MICO robotic arm. To demonstrate the efficacy of our approach, we run a pilot study with users unfamiliar with robotic systems and analyze points of failure and future directions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.