“…Depending on the constructed maps, conventional RGB-D SLAM systems are commonly categorised into sparse SLAM [1], [2], dense SLAM [3]- [5] and hybrid ones [6], [7]. Later, with advances in deep neural networks (DNNs), much attention has been paid to improve various aspects of SLAM with semantic information extracted with DNNs, such as meaningful mapping [8]- [10], dynamic tracking [11]- [13], relocalisation [14], [15], etc. Although these systems have demonstrated promising performance in terms of both tracking and reconstruction accuracy, they suffer from the huge video memory (VRAM) footprint in storing the reconstructed map and the computational consumption when modifying the map on-the-fly.…”