PointPainting: Sequential Fusion for 3D Object Detection

Vora, Sourabh; Lang, Alex H.; Helou, Bassam; Beijbom, Oscar

doi:10.1109/cvpr42600.2020.00466

Cited by 599 publications

(391 citation statements)

References 23 publications

Supporting

Mentioning

384

Contrasting

Order By: Relevance

“…Then, they removed the 3D convolutional module and processed the pseudo-image to high-level representation merely with 2D convolutional blocks. The PointPainting [16] was an effective sequential fusion method, which used a semantic segmentation network prediction from RGB images to enhance the point cloud features. These one-stage 3D detection heads adopted a set of predefined 3D anchor boxes.…”

Section: Related Workmentioning

confidence: 99%

“…In the bird's eye view evaluation, the VFE detection results achieved (91.58, 85.83, 80.54) for the three levels, respectively. Although our one-stage anchor-free network does not need any prior information about the anchor boxes during the training and prediction process, it acquires the same performance as other anchor-based, one-stage 3D detectors, such as SCNet [39], SECOND [4] and PointPainting [16].…”

Section: Experiments On the Kitti Test Setmentioning

confidence: 99%

See 1 more Smart Citation

One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors

Zhao

et al. 2021

Sensors

View full text Add to dashboard Cite

Recent one-stage 3D detection methods generate anchor boxes with various sizes and orientations in the ground plane, then determine whether these anchor boxes contain any region of interest and adjust the edges of them for accurate object bounding boxes. The anchor-based algorithm calculates the classification and regression label for each anchor box during the training process, which is inefficient and complicated. We propose a one-stage, anchor-free 3D vehicle detection algorithm based on LiDAR point clouds. The object position is encoded as a set of keypoints in the bird’s-eye view (BEV) of point clouds. We apply the voxel/pillar feature extractor and convolutional blocks to map an unstructured point cloud to a single-channel 2D heatmap. The vehicle’s Z-axis position, dimension, and orientation angle are regressed as additional attributes of the keypoints. Our method combines SmoothL1 loss and IoU (Intersection over Union) loss, and we apply (cosθ,sinθ) as angle regression labels, which achieve high average orientation similarity (AOS) without any direction classification tricks. During the target assignment and bounding box decoding process, our framework completely avoids any calculations related to anchor boxes. Our framework is end-to-end training and stands at the same performance level as the other one-stage anchor-based detectors.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Experiments On the Kitti Test Setmentioning

confidence: 99%

One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors

Zhao

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…If the raw sensor data are not well aligned in the early stage, it would lead to heavy performance degradation due to the feature dislocation. Depending on coordinate location of two sensors, PointPainting [ 5 ] and PI-RCNN [ 6 ] project the image semantic segmentation to point cloud space by projecting matrix. Although this early fusion process enables the network to handle aligned two-modality information as a whole without specific modality adjustment, the early stage fusion also conveys the noise in one modality to another modality.…”

Section: Introductionmentioning

confidence: 99%

Cascaded Cross-Modality Fusion Network for 3D Object Detection

Chen

Lin

Sun

et al. 2020

Sensors

View full text Add to dashboard Cite

We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods.

show abstract

“…Then, PointNets can be applied for 3D bounding box estimation, but the overall procedure heavily relies on the performance of 2D detectors. PointPainting [30] feeds the pixel-wise semantic features captured from the image-based semantic segmentation model onto corresponding point-wise semantic features in the point cloud to boost the performance of 3D object detection.…”

mentioning

confidence: 99%

“…It can be observed that the main disadvantage of using dense point-pixel fusion methods such as [30] is that it leads to a considerable amount of redundant computations. Meanwhile, using a BEV-image fusion method allows the deep learningbased fusion of the feature maps captured from an individual viewpoint but with geometric information losses.…”

mentioning

confidence: 99%

RoIFusion: 3D Object Detection From LiDAR and Vision

2021

View full text Add to dashboard Cite

When localizing and detecting 3D objects for autonomous driving scenes, obtaining information from multiple sensors (e.g., camera, LIDAR) is capable of mutually offering useful complementary information to enhance the robustness of 3D detectors. In this paper, a deep neural network architecture, named RoIFusion, is proposed to efficiently fuse the multi-modality features for 3D object detection by leveraging the advantages of LIDAR and camera sensors. In order to achieve this task, instead of densely combining the point-wise feature of the point cloud with the related pixel features, our fusion method novelly aggregates a small set of 3D Region of Interests (RoIs) in the point clouds with the corresponding 2D RoIs in the images, which are beneficial for reducing the computation cost and avoiding the viewpoint misalignment during the feature aggregation from different sensors. Finally, Extensive experiments are performed on the KITTI 3D object detection challenging benchmark to show the effectiveness of our fusion method and demonstrate that our deep fusion approach achieves state-of-the-art performance.

show abstract

PointPainting: Sequential Fusion for 3D Object Detection

Cited by 599 publications

References 23 publications

One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors

One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors

Cascaded Cross-Modality Fusion Network for 3D Object Detection

RoIFusion: 3D Object Detection From LiDAR and Vision

Contact Info

Product

Resources

About