Abstract-Speaking using unconstrained natural language is an intuitive and flexible way for humans to interact with robots. Understanding this kind of linguistic input is challenging because diverse words and phrases must be mapped into structures that the robot can understand, and elements in those structures must be grounded in an uncertain environment. We present a system that follows natural language directions by extracting a sequence of spatial description clauses from the linguistic input and then infers the most probable path through the environment given only information about the environmental geometry and detected visible objects. We use a probabilistic graphical model that factors into three key components. The first component grounds landmark phrases such as "the computers" in the perceptual frame of the robot by exploiting co-occurrence statistics from a database of tagged images such as Flickr. Second, a spatial reasoning component judges how well spatial relations such as "past the computers" describe a path. Finally, verb phrases such as "turn right" are modeled according to the amount of change in orientation in the path. Our system follows 60% of the directions in our corpus to within 15 meters of the true destination, significantly outperforming other approaches.
The creation of a robot chef represents a grand challenge for the field of robotics. Cooking is one of the most important activities that takes place in the home, and a robotic chef capable of following arbitrary recipes would have many applications in both household and industrial environments. The kitchen environment is a semistructured proving ground for algorithms in robotics. It provides many computational challenges, such as accurately perceiving ingredients in cluttered environments, manipulating objects, and engaging in complex activities such as mixing and chopping. Yet it also allows for reasonable simplifying assumptions due to the inherent organization of a kitchen around a human-centric workspace, the consistency of kitchen tools and tasks, and the ordered nature of recipes. We envision a robotic chef, the BakeBot, which can collect recipes online, parse them into a sequence of low-level actions, and execute them for the benefit of its human partners. We present first steps towards this vision, by combining techniques for object perception, manipulation, and language understanding to develop a novel end-to-end robot system able to follow simple recipes and by experimentally assessing the performance of these approaches in the kitchen domain. 1 Problem StatementThis paper describes progress towards a robotic system which is able to read and execute simple recipes. The robot is initialized with a set of ingredients laid out on the table and a set of natural language instructions describing how to use those ingre-
Abstract-Robots inevitably fail, often without the ability to recover autonomously. We demonstrate an approach for enabling a robot to recover from failures by communicating its need for specific help to a human partner using natural language. Our approach automatically detects failures, then generates targeted spoken-language requests for help such as "Please give me the white table leg that is on the black table." Once the human partner has repaired the failure condition, the system resumes full autonomy. We present a novel inverse semantics algorithm for generating effective help requests. In contrast to forward semantic models that interpret natural language in terms of robot actions and perception, our inverse semantics algorithm generates requests by emulating the human's ability to interpret a request using the Generalized Grounding Graph (G 3 ) framework. To assess the effectiveness of our approach, we present a corpusbased online evaluation, as well as an end-to-end user study, demonstrating that our approach increases the effectiveness of human interventions compared to static requests for help.
In order for robots to engage in dialog with human teammates, they must have the ability to identify correspondences between elements of language and aspects of the external world. A solution to this symbol grounding problem (Harnad, 1990) would enable a robot to interpret commands such as "Drive over to receiving and pick up the tire pallet." This article describes several of our results that use probabilistic inference to address the symbol grounding problem. Our approach is to develop models that factor according to the linguistic structure of a command. We first describe an early result, a generative model that factors according to the sequential structure of language, then discuss our new framework, Generalized Grounding Graphs (G 3 ). The G 3 framework dynamically instantiates a probabilistic graphical model for a natural language input, enabling a mapping between words in language and concrete objects, places, paths and events in the external world. We report on corpus-based experiments in which the robot is able to learn and use word meanings in three real-world tasks: indoor navigation, spatial language video retrieval, and mobile manipulation.
Natural language interfaces for robot control aspire to find the best sequence of actions that reflect the behavior intended by the instruction. This is difficult because of the diversity of language, variety of environments, and heterogeneity of tasks. Previous work has demonstrated that probabilistic graphical models constructed from the parse structure of natural language can be used to identify motions that most closely resemble verb phrases. Such approaches however quickly succumb to computational bottlenecks imposed by construction and search the space of possible actions. Planning constraints, which define goal regions and separate the admissible and inadmissible states in an environment model, provide an interesting alternative to represent the meaning of verb phrases. In this paper we present a new model called the Distributed Correspondence Graph (DCG) to infer the most likely set of planning constraints from natural language instructions. A trajectory planner then uses these planning constraints to find a sequence of actions that resemble the instruction. Separating the problem of identifying the action encoded by the language into individual steps of planning constraint inference and motion planning enables us to avoid computational costs associated with generation and evaluation of many trajectories. We present experimental results from comparative experiments that demonstrate improvements in efficiency in natural language understanding without loss of accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.