“…Motivated by the advances of deep learning and Convolutional Neural Networks (CNNs) for scene understanding, there have been many semantic SLAM techniques exploiting this information using cameras [5], [30], cameras + IMU data [4], stereo cameras [9], [14], [17], [32], [37], or RGB-D sensors [3], [18], [19], [25], [26], [28], [38]. Most of these approaches were only applied indoors and use either an object detector or a semantic segmentation of the camera image.…”