Robot localization remains a challenging task in GPS denied environments. State estimation approaches based on local sensors, e.g. cameras or IMUs, are drifting-prone for long-range missions as error accumulates. In this study, we aim to address this problem by localizing image observations in a 2D multi-modal geospatial map. We introduce the crossscale i dataset and a methodology to produce additional data from cross-modality sources. We propose a framework that learns cross-scale visual representations without supervision. Experiments are conducted on data from two different domains, underwater and aerial. In contrast to existing studies in crossview image geo-localization, our approach a) performs better on smaller-scale multi-modal maps; b) is more computationally efficient for real-time applications; c) can serve directly in concert with state estimation pipelines. Our code and data are released at https://github.com/tyz1030/CroScaleRep.git
I. INTRODUCTIONGeo-localization plays a key role in autonomous and robotic systems exploring a priori unknown environments in the wild. To achieve better localization accuracy, a wide range of sensors have been used on today's field robots. According to the reference frame, sensors can be generally categorized as local or global. Local sensors, e.g. cameras and inertial measurement units (IMUs), observe the environment in a local coordinate frame. Global sensors, e.g. global positioning systems (GPS), barometers, and magnetometers, provide global measurements in fixed global frames. While local sensors give high-precision local measurements, global sensors are noisier but do not suffer from the same drift effects when localizing the vehicle. The algorithmic combination of both kinds of sensors achieves locally accurate and globally drift-free performance on long-range tasks [1].However, there are many scenarios where global information is not available or only partially available. The scenarios can be underwater, underground, or other GPS denied environments. Taking underwater as an example, neither GPS nor land-based station towers can be accessed since electromagnetic waves are heavily attenuated. Acoustic localization gets downgraded by variation in salinity or temperature in the water body. Magnetometer and depth sensors are reliable global sensors underwater, however, they only provide measurements up to 4 degrees of freedom (DOFs)