Contemporary approaches to perception, planning, estimation, and control have allowed robots to operate robustly as our remote surrogates in uncertain, unstructured environments. There is now an opportunity for robots to operate not only in isolation, but also with and alongside humans in our complex environments. Natural language provides an efficient and flexible medium through which humans can communicate with collaborative robots. Through significant progress in statistical methods for natural language understanding, robots are now able to interpret a diverse array of free-form navigation, manipulation, and mobile manipulation commands. However, most contemporary approaches require a detailed prior spatial-semantic map of the robot's environment that models the space of possible referents of the utterance. Consequently, these methods fail when robots are deployed in new, previously unknown, or partially observed environments, particularly when mental models of the environment differ between the human operator and the robot. This paper provides a comprehensive description of a novel learning framework that allows field and service robots to interpret and correctly execute natural language instructions in a priori unknown, unstructured environments. Integral to our approach is its use of language as a "sensor"-inferring spatial, topological, and semantic information implicit in natural language utterances and then exploiting this information to learn a distribution over a latent environment model. We incorporate this distribution in a probabilistic language grounding model and infer a distribution over a symbolic representation of the robot's action space consistent with the utterance. We use imitation learning to identify a belief space policy that reasons over the environment and behavior distributions. We evaluate our framework through a variety of different navigation and mobile manipulation experiments involving an unmanned ground vehicle, a robotic wheelchair, and a mobile manipulator, demonstrating the ability of the algorithm to follow natural language instructions without prior knowledge of the environments.
Language is an effective medium for bi-directional communication in human-robot teams. To infer the meaning of many instructions, robots need to construct a model of their surroundings that describe the spatial, semantic, and metric properties of objects from observations and prior information about the environment. Recent algorithms condition the expression of object detectors in a robot's perception pipeline on language to generate a minimal representation of the environment necessary to efficiently determine the meaning of the instruction. We expand on this work by introducing the ability to express hierarchies between detectors. This assists in the development of environment models suitable for more sophisticated tasks that may require modeling of kinematics, dynamics, and/or affordances between objects. To achieve this, a novel extension of symbolic representations for language-guided adaptive perception is proposed that reasons over single-layer object detector hierarchies. Differences in perception performance and environment representations between adaptive perception and a suitable exhaustive baseline are explored through physical experiments on a mobile manipulator.
This paper provides a summary of recent work in the development of integrated, multi-physics models for controlled opto-mechanical systems. Standard approaches from the literature are used to model the dynamics of the structure, including piezoceramic actuators, and generate a simple state space system for the actuator-structure dynamics. Next, linear sensitivities of the coefficients of the standard Zernike orthonormal basis set for representing optical aberrations are generated with respect to motions of the optics around the equilibrium point, using commercially available ray-trace software. Using this linear sensitivity representation, the optical path difference (OPD) at a reference pupil plane is reconstructed. Inclusion of additional defocus terms to these Zernike coefficients yields a second aberration function to allow for the reconstruction of the OPD at a defocused pupil plane as well. For both of these planes, Fourier analysis is used to obtain the images produced at the corresponding image plane. The two images are provided to a phase diversity algorithm that returns estimates of the Zernike coefficients, yielding an estimated aberration based on sensed images. Laser metrology sensor signals are modeled via the addition of physically realistic noise to the structural perturbation signals, and are used to develop a control system that autoaligns the structure based on both the laser and phase diversity sensor measurements. The processes described herein are demonstrated on a model of an opto-mechanical system based on an approximate prescription for the Hubble Space Telescope combined with a simple flexible structure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.