This paper presents INVIGORATE, a robot system that interacts with human through natural language and grasps a specified object in clutter. The objects may occlude, obstruct, or even stack on top of one another. INVIGORATE embodies several challenges: (i) infer the target object among other occluding objects, from input language expressions and RGB images, (ii) infer object blocking relationships (OBRs) from the images, and (iii) synthesize a multi-step plan to ask questions that disambiguate the target object and to grasp it successfully. We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping. They allow for unrestricted object categories and language expressions, subject to the training datasets. However, errors in visual perception and ambiguity in human languages are inevitable and negatively impact the robot's performance. To overcome these uncertainties, we build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules. Through approximate POMDP planning, the robot tracks the history of observations and asks disambiguation questions in order to achieve a near-optimal sequence of actions that identify and grasp the target object. INVIGORATE combines the benefits of model-based POMDP planning and data-driven deep learning. Preliminary experiments with INVIGORATE on a Fetch robot show significant benefits of this integrated approach to object grasping in clutter with natural language interactions. A demonstration video is available online 1 .
This paper presents INVIGORATE, a robot system that interacts with human through natural language and grasps a specified object in clutter. The objects may occlude, obstruct, or even stack on top of one another. INVIGORATE embodies several challenges: (i) infer the target object among other occluding objects, from input language expressions and RGB images, (ii) infer object blocking relationships (OBRs) from the images, and (iii) synthesize a multi-step plan to ask questions that disambiguate the target object and to grasp it successfully. We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping. They allow for unrestricted object categories and language expressions, subject to the training datasets. However, errors in visual perception and ambiguity in human languages are inevitable and negatively impact the robot's performance. To overcome these uncertainties, we build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules. Through approximate POMDP planning, the robot tracks the history of observations and asks disambiguation questions in order to achieve a near-optimal sequence of actions that identify and grasp the target object. INVIGORATE combines the benefits of model-based POMDP planning and data-driven deep learning. Preliminary experiments with INVIGORATE on a Fetch robot show significant benefits of this integrated approach to object grasping in clutter with natural language interactions. A demonstration video is available online 1 .
Based on the epidemic dynamical system, we construct a new agent-based financial time series model. In order to check and testify its rationality, we compare the statistical properties of the time series model with the real stock market indices, Shanghai Stock Exchange Composite Index and Shenzhen Stock Exchange Component Index. For analyzing the statistical properties, we combine the multi-parameter analysis with the tail distribution analysis, the modified rescaled range analysis, and the multifractal detrended fluctuation analysis. For a better perspective, the three-dimensional diagrams are used to present the analysis results. The empirical research in this paper indicates that the long-range dependence property and the multifractal phenomenon exist in the real returns and the proposed model. Therefore, the new agent-based financial model can recurrence some important features of real stock markets.
Particle-based Object Manipulation (PROMPT) is a new method for robot manipulation of novel objects, without prior object models or pre-training on a large object data set. The key element of PROMPT is a particle-based object representation, in which each particle represents a point in an object, the local geometric, physical, and other features of the point, and also its relation with other particles. The particle representation connects visual perception with robot control. Like data-driven methods, PROMPT infers the object representation online in real time from the visual sensor. Like model-based methods, PROMPT leverages the particle representation to reason about the object's geometry and dynamics, and choose suitable manipulation actions accordingly. PROMPT thus combines the strengths of model-based and data-driven methods. We show empirically that PROMPT successfully handles a variety of everyday objects, some of which are transparent. It handles various manipulation tasks, including grasping, pushing, etc. . Our experiments also show that PROMPT outperforms a stateof-the-art data-driven grasping method on everyday household objects, even though it does not use any offline training data. The code and a demonstration video are available online 1 PROMPT is a simple idea, but works surprisingly well. It
China’s Chang’e lunar exploration project obtains digital orthophoto image (DOM) and digital elevation model (DEM) data covering the whole Moon, which are critical to lunar research. The DOM data have three resolutions (i.e., 7, 20 and 50 m), while the DEM has two resolutions (i.e., 20 and 50 m). Analysis and research on these image data effectively help humans to understand the Moon. In addition, impact craters are considered the most basic feature of the Moon’s surface. Statistics regarding the size and distribution of impact craters are essential for lunar geology. In existing works, however, the lunar surface has been reconstructed less accurately, and there is insufficient semantic information regarding the craters. In order to build a three-dimensional (3D) model of the Moon with crater information using Chang‘e data in the Chang‘e reference frame, we propose a four-step framework. First, software is implemented to annotate the lunar impact craters from Chang’e data by complying with our existing study on an auxiliary annotation method and open-source software LabelMe. Second, auxiliary annotation software is adopted to annotate six segments in the Chang’e data for an overall 25,250 impact crater targets. The existing but inaccurate craters are combined with our labeled data to generate a larger dataset of craters. This data set is analyzed and compared with the common detection data. Third, deep learning detection methods are employed to detect impact craters. To address the problem attributed to the resolution of Chang’e data being too high, a quadtree decomposition is conducted. Lastly, a geographic information system is used to map the DEM data to 3D space and annotate the semantic information of the impact craters. In brief, a 3D model of the Moon with crater information is implemented based on Chang’e data in the Chang‘e reference frame, which is of high significance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.