Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-theart deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches including; RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.
Structure from Motion (SfM) is a tool being increasingly utilised in geosciences for high-resolution three-dimensional mapping of landscapes. However, a number of authors have demonstrated that broad-scale systematic deformations, in the form of ‘doming’ and ‘bowling’, can occur when applied to linear (low-amplitude, feature-limited) topographies. In such contexts, a more rigorous lens calibration and ground control point acquisition process is required, which means that application of SfM to environments such as tidal flats or desert plains can be challenging. Uncertainties in elevation models generated through SfM were investigated here in the context of the low elevation, micro-topographic environment of saltmarsh. Eight digital surface models (DSMs) were generated for a saltmarsh site in the Deben Estuary (Suffolk, UK) using imagery acquired by a low-cost consumer grade unmanned aerial system (UAS). The results provide clear illustration of the systematic bowling effect following self-calibration during bundle adjustment. This was due to poor estimations of distortion parameters in the camera model. Deformation was most pronounced when UAS-GPS data were used for georeferencing. The use of dGPS-determined ground control points improved the DSM, but did not fully mitigate the deformations. By introducing a pre-calibrated model, derived using a typical checkerboard routine, deformation was significantly mitigated. These results were tested in both the commercial Agisoft PhotoScan® and open-source Micmac software. When self-calibration was used, Micmac generated significantly more accurate DSMs because a more complex lens distortion model could be implemented. The results show that when mapping flat topographies, pre-calibration of the camera model out-performs self-calibration. However, if pre-calibration is not possible, a complex distortion model (such as Micmac’s Four model) can be utilised to limit deformation. The results of the software analysis concluded there is no one-size fits all software solution, and therefore customisable open-source systems offer many potential benefits.
Robust and reliable automatic building detection and segmentation from aerial images/point clouds has been a prominent field of research in remote sensing, computer vision and point cloud processing for a number of decades. One of the largest issues associated with deep learning methods is the high quantity of data required for training. To help address this we present a method to improve public GIS building footprint labels by using Morphological Geodesic Active Contours (MorphGACs). We demonstrate by improving the quality of building footprint labels for detection and semantic segmentation, more robust and reliable models can be obtained. We evaluate these methods over a large UK-based dataset of 24556 images containing 169835 building instances. This is achieved by training several Mask/Faster R-CNN and RetinaNet deep convolutional neural networks. Networks are supplied with both RGB and fused RGB-lidar data. We offer quantitative analysis on the benefits of the inclusion of depth data for building segmentation. By employing both methods we achieve a detection accuracy of 0.92 (mAP@0.5) and segmentation f1 scores of 0.94 over a 4911 test images ranging from urban to rural scenes.
Bridges are a critical piece of infrastructure in the network of road and rail transport system. Many of the bridges in Norway (in Europe) are at the end of their lifespan, therefore regular inspection and maintenance are critical to ensure the safety of their operations. However, the traditional inspection procedures and resources required are so time consuming and costly that there exists a significant maintenance backlog. The central thrust of this paper is to demonstrate the significant benefits of adapting a Unmanned Aerial Vehicle (UAV)-assisted inspection to reduce the time and costs of bridge inspection and established the research needs associated with the processing of the (big) data produced by such autonomous technologies. In this regard, a methodology is proposed for analysing the bridge damage that comprises three key stages, (i) data collection and model training, where one performs experiments and trials to perfect drone flights for inspection using case study bridges to inform and provide necessary (big) data for the second key stage, (ii) 3D construction, where one built 3D models that offer a permanent record of element geometry for each bridge asset, which could be used for navigation and control purposes, (iii) damage identification and analysis, where deep learning-based data analytics and modelling are applied for processing and analysing UAV image data and to perform bridge damage performance assessment. The proposed methodology is exemplified via UAV-assisted inspection of Skodsberg bridge, a 140 m prestressed concrete bridge, in the Viken county in eastern Norway.
Recent developments in the field of deep learning for 3D data have demonstrated promising potential for end-to-end learning directly from point clouds. However, many real-world point clouds contain a large class im-balance due to the natural class im-balance observed in nature. For example, a 3D scan of an urban environment will consist mostly of road and façade, whereas other objects such as poles will be under-represented. In this paper we address this issue by employing a weighted augmentation to increase classes that contain fewer points. By mitigating the class im-balance present in the data we demonstrate that a standard PointNet++ deep neural network can achieve higher performance at inference on validation data. This was observed as an increase of F1 score of 19% and 25% on two test benchmark datasets; ScanNet and Semantic3D respectively where no class im-balance pre-processing had been performed. Our networks performed better on both highly-represented and under-represented classes, which indicates that the network is learning more robust and meaningful features when the loss function is not overly exposed to only a few classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.