NASA's two Mars Exploration Rovers ͑MER͒ have successfully demonstrated a robotic Visual Odometry capability on another world for the first time. This provides each rover with accurate knowledge of its position, allowing it to autonomously detect and compensate for any unforeseen slip encountered during a drive. It has enabled the rovers to drive safely and more effectively in highly sloped and sandy terrains and has resulted in increased mission science return by reducing the number of days required to drive into interesting areas. The MER Visual Odometry system comprises onboard software for comparing stereo pairs taken by the pointable mast-mounted 45 deg FOV Navigation cameras ͑NAVCAMs͒. The system computes an update to the 6 degree of freedom rover pose ͑x, y, z, roll, pitch, yaw͒ by tracking the motion of autonomously selected terrain features between two pairs of 256ϫ 256 stereo images. It has demonstrated good performance with high rates of successful convergence ͑97% on Spirit, 95% on Opportunity͒, successfully detected slip ratios as high as 125%, and measured changes as small as 2 mm, even while driving on slopes as high as 31 deg. Visual Odometry was used over 14% of the first 10.7 km driven by both rovers. During the first 2 years of operations, Visual Odometry evolved from an "extra credit" capability into a critical vehicle safety system. In this paper we describe our Visual Odometry algorithm, discuss several driving strategies that rely on it ͑including Slip Checks, Keep-out Zones, and Wheel Dragging͒, and summarize its results from the first 2 years of operations on Mars.
Abstract. Autonomous navigation in cross-country environments presents many new challenges with respect to more traditional, urban environments. The lack of highly structured components in the scene complicates the design of even basic functionalities such as obstacle detection. In addition to the geometric description of the scene, terrain typing is also an important component of the perceptual system. Recognizing the different classes of terrain and obstacles enables the path planner to choose the most efficient route toward the desired goal.This paper presents new sensor processing algorithms that are suitable for cross-country autonomous navigation. We consider two sensor systems that complement each other in an ideal sensor suite: a color stereo camera, and a single axis ladar. We propose an obstacle detection technique, based on stereo range measurements, that does not rely on typical structural assumption on the scene (such as the presence of a visible ground plane); a color-based classification system to label the detected obstacles according to a set of terrain classes; and an algorithm for the analysis of ladar data that allows one to discriminate between grass and obstacles (such as tree trunks or rocks), even when such obstacles are partially hidden in the grass. These algorithms have been developed and implemented by the Jet Propulsion Laboratory (JPL) as part of its involvement in a number of projects sponsored by the US Department of Defense, and have enabled safe autonomous navigation in high-vegetated, off-road terrain.
NASA's Mars Exploration Rover (MER) missions will land twin rovers on the surface of Mars in 2004. These rovers will have the ability to navigate safely through unknown and potentially hazardous terrain, using autonomous passive stereo vision to detect potential terrain hazards before driving into them. Unfortunately, the computational power of currently available radiation hardened processors limits the amount of distance (and therefore science) that can be safely achieved by any rover in a given time frame.
This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand 'what activity others are performing to it' from continuous video inputs. These include friendly interactions such as 'a person hugging the observer' as well as hostile interactions like 'punching the observer' or 'throwing objects to the observer', whose videos involve a large amount of camera ego-motion caused by physical interactions. The paper investigates multichannel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. In our experiments, we not only show classification results with segmented videos, but also confirm that our new approach is able to detect activities from continuous videos reliably.
In this paper, we present a new feature representation for first-person videos. In first-person video understanding (e.g., activity recognition), it is very important to capture both entire scene dynamics (i.e., egomotion) and salient local motion observed in videos. We describe a representation framework based on time series pooling, which is designed to abstract short-term/long-term changes in feature descriptor elements. The idea is to keep track of how descriptor values are changing over time and summarize them to represent motion in the activity video. The framework is general, handling any types of per-frame feature descriptors including conventional motion descriptors like histogram of optical flows (HOF) as well as appearance descriptors from more recent convolutional neural networks (CNN).We experimentally confirm that our approach clearly outperforms previous feature representations including bag-of-visual-words and improved Fisher vector (IFV) when using identical underlying feature descriptors. We also confirm that our feature representation has superior performance to existing state-of-the-art features like local spatio-temporal features and Improved Trajectory Features (originally developed for 3rd-person videos) when handling first-person videos. Multiple first-person activity datasets were tested under various settings to confirm these findings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.