Without knowledge of the absolute baseline between images, the scale of a map from single-camera simultaneous localization and mapping system is subject to calamitous drift over time. We describe a monocular approach that in addition to point measurements also considers object detections to resolve this scale ambiguity and drift. By placing a prior on the size of the objects, the scale estimation can be seamlessly integrated into a bundle adjustment. When object observations are available, the local scale of the map is then determined jointly with the camera pose in local adjustments. Unlike many previous visual odometry methods, our approach does not impose restrictions such as approximately constant camera height or planar roadways, and is therefore applicable to a much wider range of applications. We evaluate our approach on the KITTI dataset and show that it reduces scale drift over long-range outdoor sequences with a total length of 40 km. Qualitative evaluation is also performed on video footage from a hand-held camera.
Without knowledge of the absolute baseline between images the scale of a map from a single-camera simultaneous localisation and mapping system is subject to calamitous drift over time. We describe a monocular approach that in addition to point measurements also considers object detections to resolve this scale ambiguity and drift. By placing an expectation on the size of the objects, the scale estimation can be seamlessly integrated into a bundle adjustment. When object observations are available, the local scale of the map is then determined jointly with the camera pose in local adjustments. Unlike many previous visual odometry methods, our approach does not impose restrictions such as constant camera height or planar roadways, and is therefore more widely applicable. We evaluate our approach on the KITTI dataset and show that it reduces scale drift over longrange outdoor sequences with a total length of 40 km. As the scale of objects is known absolutely, metric accuracy is obtained for all sequences. Qualitative evaluation is also performed on video footage from a hand-held camera.
For many applications in low-power real-time robotics, stereo cameras are the sensors of choice for depth perception as they are typically cheaper and more versatile than their active counterparts. Their biggest drawback, however, is that they do not directly sense depth maps; instead, these must be estimated through data-intensive processes. Therefore, appropriate algorithm selection plays an important role in achieving the desired performance characteristics.Motivated by applications in space and mobile robotics, we implement and evaluate an FPGA-accelerated adaptation of the ELAS algorithm. Despite offering one of the best trade-offs between efficiency and accuracy, ELAS has only been shown to run at 1.5 − 3 fps on a high-end CPU. Our system preserves all intriguing properties of the original algorithm, such as the slanted plane priors, but can achieve a frame rate of 47fps whilst consuming under 4W of power. Unlike previous FPGA based designs, we take advantage of both components on the CPU/FPGA System-on-Chip to showcase the strategy necessary to accelerate more complex and computationally diverse algorithms for such low power, real-time systems.
Over the years, maritime surveillance has become increasingly important due to the recurrence of piracy. While surveillance has traditionally been a manual task using crew members in lookout positions on parts of the ship, much work is being done to automate this task using digital cameras coupled with a computer that uses image processing techniques that intelligently track object in the maritime environment. One such technique is level set segmentation which evolves a contour to objects of interest in a given image. This method works well but gives incorrect segmentation results when a target object is corrupted in the image. This paper explores the possibility of factoring in prior knowledge of a ship's shape into level set segmentation to improve results, a concept that is unaddressed in maritime surveillance problem. It is shown that the developed video tracking system outperforms level set-based systems that do not use prior shape knowledge, working well even where these systems fail.
Monocular visual localization and mapping algorithms are able to estimate the environment only up to scale, a degree of freedom which leads to scale drift, difficulty closing loops, and eventual failure. This paper describes an imagedriven approach for scale-drift correction which uses a convolutional neural network to infer the speed of the camera from successive monocular video frames. We obtain continuous drift correction, avoiding the need for explicit higherlevel representations of the map to resolve scale. We also propose a novel method of including speed estimates as a regularizer in bundle adjustment which avoids the pitfalls of sudden imposition of scale knowledge. We demonstrate our approach using long-distance sequences for which ground truth is available, and find output that is essentially free of scale drift. We compare the performance with number of other methods for scale-drift correction from monocular data, and show that our solution achieves more accurate results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.