Abstract-This paper reports on WaterGAN, a generative adversarial network (GAN) for generating realistic underwater images from in-air image and depth pairings in an unsupervised pipeline used for color correction of monocular underwater images. Cameras onboard autonomous and remotely operated vehicles can capture high resolution images to map the seafloor; however, underwater image formation is subject to the complex process of light propagation through the water column. The raw images retrieved are characteristically different than images taken in air due to effects such as absorption and scattering, which cause attenuation of light at different rates for different wavelengths. While this physical process is well described theoretically, the model depends on many parameters intrinsic to the water column as well as the structure of the scene. These factors make recovery of these parameters difficult without simplifying assumptions or field calibration; hence, restoration of underwater images is a non-trivial problem. Deep learning has demonstrated great success in modeling complex nonlinear systems but requires a large amount of training data, which is difficult to compile in deep sea environments. Using WaterGAN, we generate a large training dataset of corresponding depth, in-air color images, and realistic underwater images. This data serves as input to a two-stage network for color correction of monocular underwater images. Our proposed pipeline is validated with testing on real data collected from both a pure water test tank and from underwater surveys collected in the field. Source code, sample datasets, and pretrained models are made publicly available.
Abstract-This paper reports the novel insight that the simultaneous localization and mapping (SLAM) information matrix is exactly sparse in a delayed-state framework. Such a framework is used in view-based representations of the environment that rely upon scan-matching raw sensor data to obtain virtual observations of robot motion with respect to a place it has previously been. The exact sparseness of the delayed-state information matrix is in contrast to other recent feature-based SLAM information algorithms, such as sparse extended information filter or thin junction-tree filter, since these methods have to make approximations in order to force the feature-based SLAM information matrix to be sparse. The benefit of the exact sparsity of the delayed-state framework is that it allows one to take advantage of the information space parameterization without incurring any sparse approximation error. Therefore, it can produce equivalent results to the full-covariance solution. The approach is validated experimentally using monocular imagery for two datasets: a test-tank experiment with ground truth, and a remotely operated vehicle survey of the RMS Titanic.Index Terms-Information filters, Kalman filtering, machine vision, mobile robot motion planning, mobile robots, recursive estimation, robot vision systems, simultaneous localization and mapping (SLAM), underwater vehicles.
As light is transmitted from subject to observer it is absorbed and scattered by the medium it passes through. In mediums with large suspended particles, such as fog or turbid water, the effect of scattering can drastically decrease the quality of images. In this paper we present an algorithm for removing the effects of light scattering, referred to as dehazing, in underwater images. Our key contribution is to propose a simple, yet effective, prior that exploits the strong difference in attenuation between the three image color channels in water to estimate the depth of the scene. We then use this estimate to reduce the spatially varying effect of haze in the image. Our method works with a single image and does not require any specialized hardware or prior knowledge of the scene. As a by-product of the dehazing process, an up-to-scale depth map of the scene is produced. We present results over multiple real underwater images and over a controlled test set where the target distance and true colors are known.
This paper documents a large scale, long-term autonomy dataset for robotics research collected on the University of Michigan's North Campus. The dataset consists of omnidirectional imagery, 3D lidar, planar lidar, GPS, and proprioceptive sensors for odometry collected using a Segway robot. The dataset was collected to facilitate research focusing on long-term autonomous operation in changing environments. The dataset is composed of 27 sessions spaced approximately biweekly over the course of 15 months. The sessions repeatedly explore the campus, both indoors and outdoors, on varying trajectories, and at different times of the day across all four seasons. This allows the dataset to capture many challenging elements including: moving obstacles (e.g. pedestrians, bicyclists and cars), changing lighting, varying viewpoint, seasonal and weather changes (e.g. falling leaves and snow), and long-term structural changes caused by construction projects. To further facilitate research, we also provide ground-truth pose for all sessions in a single frame of reference.
This paper reports on an algorithm for automatic, targetless, extrinsic calibration of a lidar and optical camera system based upon the maximization of mutual information between the sensor-measured surface intensities. The proposed method is completely data-driven and does not require any fiducial calibration targets-making in situ calibration easy. We calculate the Cramér-Rao lower bound (CRLB) of the estimated calibration parameter variance, and we show experimentally that the sample variance of the estimated parameters empirically approaches the CRLB when the amount of data used for calibration is sufficiently large. Furthermore, we compare the calibration results to independent ground-truth (where available) and observe that the mean error empirically approaches zero as the amount of data used for calibration is increased, thereby suggesting that the proposed estimator is a minimum variance unbiased estimate of the calibration parameters. Experimental results are presented for three different lidar-camera systems: (i) a three-dimensional (3D) lidar and omnidirectional camera, (ii) a 3D time-of-flight sensor and monocular camera, and (iii) a 2D lidar and monocular camera. C
Legged robots require knowledge of pose and velocity in order to maintain stability and execute walking paths. Current solutions either rely on vision data, which is susceptible to environmental and lighting conditions, or fusion of kinematic and contact data with measurements from an inertial measurement unit (IMU). In this work, we develop a contact-aided invariant extended Kalman filter (InEKF) using the theory of Lie groups and invariant observer design. This filter combines contact-inertial dynamics with forward kinematic corrections to estimate pose and velocity along with all current contact points. We show that the error dynamics follows a log-linear autonomous differential equation with several important consequences: (a) the observable state variables can be rendered convergent with a domain of attraction that is independent of the system's trajectory; (b) unlike the standard EKF, neither the linearized error dynamics nor the linearized observation model depend on the current state estimate, which (c) leads to improved convergence properties and (d) a local observability matrix that is consistent with the underlying nonlinear system. Furthermore, we demonstrate how to include IMU biases, add/remove contacts, and formulate both world-centric and robo-centric versions. We compare the convergence of the proposed InEKF with the commonly used quaternion-based EKF though both simulations and experiments on a Cassie-series bipedal robot. Filter accuracy is analyzed using motion capture, while a LiDAR mapping experiment provides a practical use case. Overall, the developed contactaided InEKF provides better performance in comparison with the quaternion-based EKF as a result of exploiting symmetries present in system. changes in lighting as well as the operating environment. It is therefore beneficial to develop a low-level state estimator that fuses data only from proprioceptive sensors to form accurate high-frequency state estimates. This approach was taken by Bloesch et al. [15] when developing a quaternion-based extended Kalman filter (QEKF) that combines inertial, contact, and kinematic data to estimate the robot's base pose, velocity, and a number of contact states. In this article, we expand upon these ideas to develop an invariant extended Kalman filter (InEKF) that has improved convergence and consistency properties allowing for a more robust observer that is suitable for long-term autonomy.The theory of invariant observer design is based on the estimation error being invariant under the action of a matrix Lie group [1,20], which has recently led to the development of the InEKF 1 [18,8,10,11] with successful applications and promising results in simultaneous localization and mapping [8,86] and aided inertial navigation systems [4,5,8,83]. The invariance of the estimation error with respect to a Lie group action is referred to as a symmetry of the system [5]. Summarized briefly, Barrau and Bonnabel [10] showed that if the state is defined on a Lie group, and the dynamics satisfy a particular "group affin...
Abstract-This paper reports on the problem of map-based visual localization in urban environments for autonomous vehicles. Self-driving cars have become a reality on roadways and are going to be a consumer product in the near future. One of the most significant road-blocks to autonomous vehicles is the prohibitive cost of the sensor suites necessary for localization. The most common sensor on these platforms, a three-dimensional (3D) light detection and ranging (LIDAR) scanner, generates dense point clouds with measures of surface reflectivity-which other state-of-the-art localization methods have shown are capable of centimeter-level accuracy. Alternatively, we seek to obtain comparable localization accuracy with significantly cheaper, commodity cameras. We propose to localize a single monocular camera within a 3D prior groundmap, generated by a survey vehicle equipped with 3D LIDAR scanners. To do so, we exploit a graphics processing unit to generate several synthetic views of our belief environment. We then seek to maximize the normalized mutual information between our real camera measurements and these synthetic views. Results are shown for two different datasets, a 3.0 km and a 1.5 km trajectory, where we also compare against the state-of-the-art in LIDAR map-based localization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.