“…For instance, P = 1 was adopted in [6], which assumed the raw data from a LI-DAR sensor was converted to features Θ t,1 . In [7], P = 3 distinct feature modalities were adopted: Θ t,1 with positions from a GNSS (GPS) receiver, Θ t,2 with resampled images from RGB cameras and Θ t,3 with features from LIDAR [8].…”