A Point-Wise LiDAR and Image Multimodal Fusion Network (PMNet) for Aerial Point Cloud 3D Semantic Segmentation

Vinayaraj, Poliyapram; Wang, Weimin; Nakamura, Ryosuke

doi:10.3390/rs11242961

Cited by 16 publications

(14 citation statements)

References 40 publications

(61 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To address this problem, a constrained network of points [21] has been adopted. The lost information was complemented by structure awareness [22], multimodal aggregation [23,24], an incremental approach [25], and fragment integrity [26]. However, the structural features and the effect of large-scale segmentation need to be improved.…”

Section: B the Methods Of Deep Learningmentioning

confidence: 99%

MLFNet- Point Cloud Semantic Segmentation Convolution Network Based on Multi-Scale Feature Fusion

et al. 2021

View full text Add to dashboard Cite

The project was funded by jinan science and technology bureau and undertaken by QiluUniversity of Technology(Shandong Academy of Sciences),Machine vision-based online intelligentsegmentation of foreign objects in liquids is implemented.2019GXRC067. ABSTRACTIn the semantic segmentation of a point cloud, if the spatial structure correlation between the input features and coordinates are not fully considered, a semantic segmentation error can occur. We propose a method of spatial convolution that makes full use of the characteristics of a multiscale spatial structure by combining local and global features. We call this method MLFNet. We also propose a multiscale feature framework. First, the point cloud is simplified by obtaining the weighted farthest point (by down-sampling combined with farthest-point sampling and the weighted average). The near-near domain of each sampling point is then obtained by a KK octant search (an octant search optimized by the knearest neighbor and a custom threshold), and feature information is obtained. The feature information is added to the subsequent multilayer perceptron, and fusion of local context information is achieved. Finally, the fusion features in multiple directions are maximally pooled. Our method was tested on self-made datasets and other standard basic datasets (ModelNet40, ShapeNet, and Stanford large-scale 3D indoor spaces (S3DIS) data). The accuracy of segmentation was 0.937 in our dataset; two percentage points higher than the latest deep learning method. Also, our method obtained a mean intersection over a union of 0.867 in ShapeNet, which was 0.3 percentage points higher than the latest PointGrid. The accuracy on S3DIS was 0.8153, which was three percentage points higher than the latest spatial aggregation net. The results of semantic segmentation verified the superiority of the proposed method.

show abstract

Section: B the Methods Of Deep Learningmentioning

confidence: 99%

MLFNet- Point Cloud Semantic Segmentation Convolution Network Based on Multi-Scale Feature Fusion

et al. 2021

View full text Add to dashboard Cite

show abstract

“…By applying RGB features, overall accuracy increased by 2%, from 86% to 88%. Additionally, Poliyapram et al [24] propose end-to-end point-wise LiDAR and a so-called image multimodal fusion network (PMNet) for classification of an ALS point cloud of Osaka city in combination with aerial image RGB features. Their results show that the combination of intensity and RGB features could improve overall accuracy from 65% to 79%, while the performance in identifying buildings improved by 4%.…”

Section: Related Workmentioning

confidence: 99%

Airborne Laser Scanning Point Cloud Classification Using the DGCNN Deep Learning Method

et al. 2021

View full text Add to dashboard Cite

Classification of aerial point clouds with high accuracy is significant for many geographical applications, but not trivial as the data are massive and unstructured. In recent years, deep learning for 3D point cloud classification has been actively developed and applied, but notably for indoor scenes. In this study, we implement the point-wise deep learning method Dynamic Graph Convolutional Neural Network (DGCNN) and extend its classification application from indoor scenes to airborne point clouds. This study proposes an approach to provide cheap training samples for point-wise deep learning using an existing 2D base map. Furthermore, essential features and spatial contexts to effectively classify airborne point clouds colored by an orthophoto are also investigated, in particularly to deal with class imbalance and relief displacement in urban areas. Two airborne point cloud datasets of different areas are used: Area-1 (city of Surabaya—Indonesia) and Area-2 (cities of Utrecht and Delft—the Netherlands). Area-1 is used to investigate different input feature combinations and loss functions. The point-wise classification for four classes achieves a remarkable result with 91.8% overall accuracy when using the full combination of spectral color and LiDAR features. For Area-2, different block size settings (30, 50, and 70 m) are investigated. It is found that using an appropriate block size of, in this case, 50 m helps to improve the classification until 93% overall accuracy but does not necessarily ensure better classification results for each class. Based on the experiments on both areas, we conclude that using DGCNN with proper settings is able to provide results close to production.

show abstract

“…Some of the normalized features, which are visible in both FLIR and LLTV cameras, and fed to the network are straight edges, winding edges, anisotropy and contrast information from each image. In [40], the authors propose a novel deep learning-based LIDAR and image fusion neural network (PMNet) for extracting meaningful information from aerial images and 3D point clouds. The fusion procedure uses spatial correspondence-point-wise fusion-which is done at feature level and shows improved performance with low memory usage and less computational parameters.…”

Section: Black Box Sensor Fusionmentioning

confidence: 99%

Stabilization and Validation of 3D Object Position Using Multimodal Sensor Fusion and Semantic Segmentation

Muresan

Giosan

Nedevschi

2020

Sensors

View full text Add to dashboard Cite

The stabilization and validation process of the measured position of objects is an important step for high-level perception functions and for the correct processing of sensory data. The goal of this process is to detect and handle inconsistencies between different sensor measurements, which result from the perception system. The aggregation of the detections from different sensors consists in the combination of the sensorial data in one common reference frame for each identified object, leading to the creation of a super-sensor. The result of the data aggregation may end up with errors such as false detections, misplaced object cuboids or an incorrect number of objects in the scene. The stabilization and validation process is focused on mitigating these problems. The current paper proposes four contributions for solving the stabilization and validation task, for autonomous vehicles, using the following sensors: trifocal camera, fisheye camera, long-range RADAR (Radio detection and ranging), and 4-layer and 16-layer LIDARs (Light Detection and Ranging). We propose two original data association methods used in the sensor fusion and tracking processes. The first data association algorithm is created for tracking LIDAR objects and combines multiple appearance and motion features in order to exploit the available information for road objects. The second novel data association algorithm is designed for trifocal camera objects and has the objective of finding measurement correspondences to sensor fused objects such that the super-sensor data are enriched by adding the semantic class information. The implemented trifocal object association solution uses a novel polar association scheme combined with a decision tree to find the best hypothesis–measurement correlations. Another contribution we propose for stabilizing object position and unpredictable behavior of road objects, provided by multiple types of complementary sensors, is the use of a fusion approach based on the Unscented Kalman Filter and a single-layer perceptron. The last novel contribution is related to the validation of the 3D object position, which is solved using a fuzzy logic technique combined with a semantic segmentation image. The proposed algorithms have a real-time performance, achieving a cumulative running time of 90 ms, and have been evaluated using ground truth data extracted from a high-precision GPS (global positioning system) with 2 cm accuracy, obtaining an average error of 0.8 m.

show abstract

A Point-Wise LiDAR and Image Multimodal Fusion Network (PMNet) for Aerial Point Cloud 3D Semantic Segmentation

Cited by 16 publications

References 40 publications

MLFNet- Point Cloud Semantic Segmentation Convolution Network Based on Multi-Scale Feature Fusion

MLFNet- Point Cloud Semantic Segmentation Convolution Network Based on Multi-Scale Feature Fusion

Airborne Laser Scanning Point Cloud Classification Using the DGCNN Deep Learning Method

Stabilization and Validation of 3D Object Position Using Multimodal Sensor Fusion and Semantic Segmentation

Contact Info

Product

Resources

About