Estimating the 6D pose of objects using only RGB images remains challenging because of problems such as occlusion and symmetries. It is also difficult to construct 3D models with precise texture without expert knowledge or specialized scanning devices. To address these problems, we propose a novel pose estimation method, Pix2Pose, that predicts the 3D coordinates of each object pixel without textured models. An auto-encoder architecture is designed to estimate the 3D coordinates and expected errors per pixel. These pixel-wise predictions are then used in multiple stages to form 2D-3D correspondences to directly compute poses with the PnP algorithm with RANSAC iterations. Our method is robust to occlusion by leveraging recent achievements in generative adversarial training to precisely recover occluded parts. Furthermore, a novel loss function, the transformer loss, is proposed to handle symmetric objects by guiding predictions to the closest symmetric pose. Evaluations on three different benchmark datasets containing symmetric and occluded objects show our method outperforms the state of the art using only RGB images.
We propose a decentralized variant of Monte Carlo tree search (MCTS) that is suitable for a variety of tasks in multi-robot active perception. Our algorithm allows each robot to optimize its own actions by maintaining a probability distribution over plans in the joint-action space. Robots periodically communicate a compressed form of their search trees, which are used to update the joint distribution using a distributed optimization approach inspired by variational methods. Our method admits any objective function defined over robot action sequences, assumes intermittent communication, is anytime, and is suitable for online replanning. Our algorithm features a new MCTS tree expansion policy that is designed for our planning scenario. We extend the theoretical analysis of standard MCTS to provide guarantees for convergence rates to the optimal payoff sequence. We evaluate the performance of our method for generalized team orienteering and online active object recognition using real data, and show that it compares favorably to centralized MCTS even with severely degraded communication. These examples demonstrate the suitability of our algorithm for real-world active perception with multiple robots.
Developing robot perception systems for recognizing objects in the real-world requires computer vision algorithms to be carefully scrutinized with respect to the expected operating domain. This demands large quantities of ground truth data to rigorously evaluate the performance of algorithms. This paper presents the EasyLabel tool for easily acquiring high quality ground truth annotation of objects at the pixellevel in densely cluttered scenes. In a semi-automatic process, complex scenes are incrementally built and EasyLabel exploits depth change to extract precise object masks at each step. We use this tool to generate the Object Cluttered Indoor Dataset (OCID) that captures diverse settings of objects, background, context, sensor to scene distance, viewpoint angle and lighting conditions. OCID is used to perform a systematic comparison of existing object segmentation methods. The baseline comparison supports the need for pixel-and object-wise annotation to progress robot vision towards realistic applications. This insight reveals the usefulness of EasyLabel and OCID to better understand the challenges that robots face in the real-world.
The spatiotemporal behavior of human EEG oscillations is investigated. Traveling waves in the alpha and theta ranges are found to be common in both prestimulus and poststimulus EEG activity. The dynamical properties of these waves, including their speeds, directions, and durations, are systematically characterized for the first time, and the results show that there are significant changes of prestimulus spontaneous waves in the presence of an external stimulus. Furthermore, the functional relevance of these waves is examined by studying how they are correlated with reaction times on a single trial basis; prestimulus alpha waves traveling in the frontal-to-occipital direction are found to be most correlated to reaction speeds. These findings suggest that propagating waves of brain oscillations might be involved in mediating long-range interactions between widely distributed parts of human cortex.
This paper describes a vision-based obstacle detection and navigation system for use as part of a robotic solution for the sustainable intensification of broad-acre agriculture. To be cost-effective, the robotics solution must be competitive with current human-driven farm machinery. Significant costs are in high-end localization and obstacle detection sensors. Our system demonstrates a combination of an inexpensive global positioning system and inertial navigation system with vision for localization and a single stereo vision system for obstacle detection. The paper describes the design of the robot, including detailed descriptions of three key parts of the system: novelty-based obstacle detection, visually-aided guidance, and a navigation system that generates collision-free kinematically feasible paths. The robot has seen extensive testing over numerous weeks of field trials during the day and night. The results in this paper pertain to one particular 3 h nighttime experiment in which the robot performed a coverage task and avoided obstacles. Additional results during the day demonstrate that the robot is able to continue operating during 5 min GPS outages by visually following crop rows. C 2016Wiley Periodicals, Inc.This section describes the field environment, the robot (including the physical and software robot platform), and the algorithmic details of our navigation system.
We present an end-to-end method for active object classification in cluttered scenes from RGB-D data. Our algorithms predict the quality of future viewpoints in the form of entropy using both class and pose. Occlusions are explicitly modelled in predicting the visible regions of objects, which modulates the corresponding discriminatory value of a given view. We implement a one-step greedy planner and demonstrate our method online using a mobile robot. We also analyse the performance of our method compared to similar strategies in simulated execution using the Willow Garage dataset. Results show that our active method usefully reduces the number of views required to accurately classify objects in clutter as compared to traditional passive perception.
Classifying objects in complex unknown environments is a challenging problem in robotics and is fundamental in many applications. Modern sensors and sophisticated perception algorithms extract rich 3D textured information, but are limited to the data that are collected from a given location or path. We are interested in closing the loop around perception and planning, in particular to plan paths for better perceptual data, and focus on the problem of planning scanning sequences to improve object classification from range data. We formulate a novel time-constrained active classification problem and propose solution algorithms that employ a variation of Monte Carlo tree search to plan non-myopically. Our algorithms use a particle filter combined with Gaussian process regression to estimate joint distributions of object class and pose. This estimator is used in planning to generate a probabilistic belief about the state of objects in a scene, and also to generate beliefs for predicted sensor observations from future viewpoints. These predictions consider occlusions arising from predicted object positions and shapes. We evaluate our algorithms in simulation, in comparison to passive and greedy strategies. We also describe similar experiments where the algorithms are implemented online, using a mobile ground robot in a farm environment. Results indicate that our non-myopic approach outperforms both passive and myopic strategies, and clearly show the benefit of active perception for outdoor object classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citationsācitations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright Ā© 2024 scite LLC. All rights reserved.
Made with š for researchers
Part of the Research Solutions Family.