This paper presents a mobile robot system with vision based teleoperation for the use in unstructured outdoor environment. Our system makes use of a camera-laser combined sensor system, which gives suitable information for the operator in such unstructured environments with homogenous reflection, as the suiface of the frozen Lake Balaton. The role of the human vision is essential in visual guided teleoperation. A comprehensive overview of the human 's 3D sensing, and the role ofthe different vision cues is introduced The needfor the combined 2D-3D vision, our approach for that and the first experimental results in teleoperation are presented.
3-dimensional, accurate, and up-to-date maps are essential for vehicles with autonomous capabilities, whose functionality is made possible by machine learning-based algorithms. Since these solutions require a tremendous amount of data for parameter optimization, simulation-to-reality (Sim2Real) methods have been proven immensely useful for training data generation. For creating realistic models to be used for synthetic data generation, crowdsourcing techniques present a resource-efficient alternative. In this paper, we show that using the Carla simulation environment, a crowdsourcing model can be created that mimics a multi-agent data gathering and processing pipeline. We developed a solution that yields dense point clouds based on monocular images and location information gathered by individual data acquisition vehicles. Our method provides scene reconstructions using the robust Structure-from-Motion (SfM) solution of Colmap. Moreover, we introduce a solution for synthesizing dense ground truth point clouds originating from the Carla simulator using a simulated data acquisition pipeline. We compare the results of the Colmap reconstruction with the reference point cloud after aligning them using the iterative closest point algorithm. Our results show that a precise point cloud reconstruction was feasible with this crowdsourcing-based approach, with 54\% of the reconstructed points having an error under 0.05 m, and a weighted root mean square error of 0.0449 m for the entire point cloud.
In this paper, a novel solution is introduced for visual Simultaneous Localization and Mapping (vSLAM) that is built up of Deep Learning components. The proposed architecture is a highly modular framework in which each component offers state of the art results in their respective fields of vision-based Deep Learning solutions. The paper shows that with the synergic integration of these individual building blocks, a functioning and efficient all-through deep neural (ATDN) vSLAM system can be created. The Embedding Distance Loss function is introduced and using it the ATDN architecture is trained. The resulting system managed to achieve 4.4% translation and 0.0176 deg/m rotational error on a subset of the KITTI dataset. The proposed architecture can be used for efficient and low-latency autonomous driving (AD) aiding database creation as well as a basis for autonomous vehicle (AV) control.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.