This paper presents a system for robust, large-scale topological localisation using Frequency-Modulated Continuous-Wave (FMCW) scanning radar. We learn a metric space for embedding polar radar scans using CNN and NetVLAD architectures traditionally applied to the visual domain. However, we tailor the feature extraction for more suitability to the polar nature of radar scan formation using cylindrical convolutions, anti-aliasing blurring, and azimuth-wise max-pooling; all in order to bolster the rotational invariance. The enforced metric space is then used to encode a reference trajectory, serving as a map, which is queried for nearest neighbours (NNs) for recognition of places at run-time. We demonstrate the performance of our topological localisation system over the course of many repeat forays using the largest radar-focused mobile autonomy dataset released to date, totalling 280 km of urban driving, a small portion of which we also use to learn the weights of the modified architecture. As this work represents a novel application for FMCW radar, we analyse the utility of the proposed method via a comprehensive set of metrics which provide insight into the efficacy when used in a realistic system, showing improved performance over the root architecture even in the face of random rotational perturbation.
We provide the theory and the system needed to create large-scale dense reconstructions for mobile-robotics applications: this stands in contrast to the object-centric reconstructions dominant in the literature. Our BOR2G system fuses data from multiple sensor modalities (cameras, lidars, or both) and regularizes the resulting 3D model. We use a compressed 3D data structure, which allows us to operate over a large scale. In addition, because of the paucity of surface observations by the camera and lidar sensors, we regularize over both two (camera depth maps) and three dimensions (voxel grid) to provide a local contextual prior for the reconstruction. Our regularizer reduces the median error between 27% and 36% in 7.3 km of dense reconstructions with a median accuracy between 4 and 8 cm. Our pipeline does not end with regularization. We take the unusual step to apply a learned correction mechanism that takes the global context of the reconstruction and adjusts the constructed mesh, addressing errors that are pathological to the first-pass camera-derived reconstruction. We evaluate our system using the Stanford Burghers of Calais, Imperial College ICL-NUIM, Oxford Broad Street (released with this paper), and the KITTI datasets. These latter datasets see us operating at a combined scale and accuracy not seen in the literature. We provide statistics for the metric errors in all surfaces created compared with those measured with 3D lidar as ground truth. We demonstrate our system in practice by reconstructing the inside of the EUROfusion Joint European Torus (JET) fusion reactor, located at the Culham Centre for Fusion Energy (UK Atomic Energy Authority) in Oxfordshire.
Building good 3D maps is a challenging and expensive task, which requires high-quality sensors and careful, time-consuming scanning. We seek to reduce the cost of building good reconstructions by correcting views of existing low-quality ones in a post-hoc fashion using learnt priors over surfaces and appearance. We train a convolutional neural network model to predict the difference in inverse-depth from varying viewpoints of two meshes — one of low-quality that we wish to correct, and one of high-quality that we use as a reference. Our full model runs at 11.3[Formula: see text]Hz when aggregating four input views. In contrast to previous work, we pay attention to the problem of excessive smoothing in corrected meshes. We address this with a suitable network architecture, and introduce a loss-weighting mechanism that emphasizes edges in the prediction. Furthermore, smooth predictions result in geometrical inconsistencies. To deal with this issue, we present a loss function which penalizes re-projection differences that are not due to occlusions. Future applications of this work will incorporate semantic scene understanding in a multi-task learning setting. We explore the efficacy of the proposed system in terms of gross error correction and generalization capability by showing its performance in practice on a subset of the Kitti Odometry dataset, complete with a component-wise ablation study. We evaluate correctness and completeness measures of surface reconstruction across viewpoints and show that the proposed system is introspective in regions lacking sufficient high-quality supervision — indeed, models trained with geometric consistency loss create a lot more surface in areas that were not supervised, in one case filling in 67.97% or 8010[Formula: see text]m2 of an unlabeled input region. Finally, we assess the practical applicability of our method at large-scale by experiments over the full scope of the Kitti Odometry dataset. Broadly, as a measure of effectiveness, our model reduces gross errors by 45.3–77.5%, up to five times more than previous work. We also assess the practical applicability of our method to 3D reconstruction at large scales and find that compared to the baseline our model shows better stability in correctness when improving completeness of surfaces, and is effective in reducing median total error by up to 21.8[Formula: see text]cm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.