It is important for a robot to be able to interpret natural language commands given by a human. In this paper, we consider performing a sequence of mobile manipulation tasks with instructions described in natural language. Given a new environment, even a simple task such as boiling water would be performed quite differently depending on the presence, location and state of the objects. We start by collecting a dataset of task descriptions in free-form natural language and the corresponding grounded task-logs of the tasks performed in an online robot simulator. We then build a library of verb–environment instructions that represents the possible instructions for each verb in that environment, these may or may not be valid for a different environment and task context. We present a model that takes into account the variations in natural language and ambiguities in grounding them to robotic instructions with appropriate environment context and task constraints. Our model also handles incomplete or noisy natural language instructions. It is based on an energy function that encodes such properties in a form isomorphic to a conditional random field. We evaluate our model on tasks given in a robotic simulator and show that it successfully outperforms the state of the art with 61.8% accuracy. We also demonstrate a grounded robotic instruction sequence on a PR2 robot using the Learning from Demonstration approach.
There is a large variety of objects and appliances in human environments, such as stoves, coffee dispensers, juice extractors, and so on. It is challenging for a roboticist to program a robot for each of these object types and for each of their instantiations. In this work, we present a novel approach to manipulation planning based on the idea that many household objects share similarly-operated object parts. We formulate the manipulation planning as a structured prediction problem and design a deep learning model that can handle large noise in the manipulation demonstrations and learns features from three different modalities: point-clouds, language and trajectory. In order to collect a large number of manipulation demonstrations for different objects, we developed a new crowd-sourcing platform called Robobarista. We test our model on our dataset consisting of 116 objects with 249 parts along with 250 language instructions, for which there are 1225 crowd-sourced manipulation demonstrations. We further show that our robot can even manipulate objects it has never seen before.
Chalcogenide material Ge2Sb2Te5 (GST) has bistable phases, the so‐called amorphous and crystalline phases that exhibit large refractive index contrast. It can be reversibly switched within a nanosecond time scale through applying thermal bias, especially optical or electrical pulse signals. Recently, GST has been exploited as an ingredient of all‐optical dynamic metasurfaces, thanks to its ultrafast and efficient switching functionality. However, most of these devices provide only two‐level switching functionality and this limitation hinders their application to diverse all‐optical systems. In this paper, the method to expand switching functionality of GST metasurfaces to three level through engineering thermo‐optically creatable hybrid state that is co‐existing state of amorphous and crystalline GST‐based meta‐atoms is proposed. Furthermore, the novel hologram technique is introduced for providing the visual information that is only recognizable in the hybrid state GST metasurface. Thanks to thermo‐optical complexity to make the hybrid state, the metasurface allows the realization of highly secured visual cryptography architecture without the complex optical setup. The phase‐change metasurface based on multi‐physical design has significant potential for applications such as all‐optical image encryption, security, and anti‐counterfeiting.
Since Leith and Upatnieks demonstrated the first optical hologram in 1964, hologram technology has attracted a great deal of interest in a wide range of optical fields owing to its potential use in future optical applications such as holographic imaging and optical data storage. Although there have been considerable efforts to develop holographic technologies using conventional optics, critical issues still hinder future development. Recently, metasurfaces composed of artificially fabricated subwavelength structures have been considered as novel holographic devices that show an unprecedented ability to control electromagnetic waves. In this review, we outline the recent progress in metasurface holography. A general introduction to several types of metasurface holography categorized based on their physics and application is provided. Then, our personal perspective on the future of this field is discussed.
Spiral phase contrast imaging offers an excellent opportunity to observe nonlabeled biological samples with slight variations in refractive index or thickness. However, the overall system covering previous works is still complex and bulky, hindering miniaturization and compatibility with conventional systems. Furthermore, high-resolution imaging, particularly for observing biological specimens such as cellular structures, requires several refractive optical elements like objectives and relay optics which dramatically increases the system form factor. Here, it is demonstrated that a metalens, in which the phase profile is a sum of the hyperbolic phase and spiral phase with a topological charge of 1, performs 2D isotropic edge-enhanced imaging. The metalens achieves a submicrometer resolution of up to 0.78 µm operated under visible broadband range in conjunction with a compact form factor. Furthermore, experiments with biological samples can additionally prove the feasibility of practical usage. Capitalizing on compactness and high-resolution characteristics, it is believed that the scheme provides a stepping stone to biomedical imaging technologies and analog computing.
Abstract-Being able to detect and recognize human activities is essential for several applications, including personal assistive robotics. In this paper, we perform detection and recognition of unstructured human activity in unstructured environments. We use a RGBD sensor (Microsoft Kinect) as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and pointcloud information. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM), which considers a person's activity as composed of a set of sub-activities. We infer the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve good performance even when the person was not seen before in the training set. 1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.