M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

Guan, Tianrui; Wang, Jun; Lan, Shi-Yi; Chandra, Rohan; Wu, Zuxuan; Davis, Larry S.; Manocha, Dinesh

doi:10.48550/arxiv.2104.11896

Cited by 5 publications

(5 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Secondly, our current two-stage refinement modules only use features from the bird-eye view which may not take full advantage of the high-resolution virtual points generated from our algorithm. We believe point or voxel-based two-stage 3D detectors like PVRCNN [45] and M3Detr [15] may give more significant improvements. Finally, the point-based abstraction connecting 2D and 3D detection may introduce too large of a bottleneck to transmit information from 2D to 3D.…”

Section: Discussionmentioning

confidence: 98%

Multimodal Virtual Point 3D Detection

Yin¹

2021

Preprint

View full text Add to dashboard Cite

Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current Lidar sensors still lag two decades behind traditional color cameras in terms of resolution and cost. For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two. This is an issue, especially when these objects turn out to be driving hazards. On the other hand, these same objects are clearly visible in onboard RGB sensors. In this work, we present an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition. Our approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point cloud. These virtual points naturally integrate into any standard Lidar-based 3D detectors along with regular Lidar measurements. The resulting multi-modal detector is simple and effective. Experimental results on the large-scale nuScenes dataset show that our framework improves a strong CenterPoint baseline by a significant 6.6 mAP, and outperforms competing fusion approaches. Code and more visualizations are available at https://tianweiy.github.io/mvp/.

show abstract

Section: Discussionmentioning

confidence: 98%

Multimodal Virtual Point 3D Detection

Yin¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Alternatively, [44] PSNet proposes a fast data structuring method to tackle the data structuring issue in point-based methods. There are also other methods that combine both point-wise and voxel-wise features in a local field [10], [45] or a global field [46]- [48]. In addition, some of these approaches propose using continuous convolution on the 3D point clouds by the designed kernel structure.…”

Section: Related Workmentioning

confidence: 99%

LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers

Huang

Zhao

et al. 2023

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

“…Consequently, Transformer architectures have also started to emerge for point cloud processing. For instance, [24] and [25] employ Transformers for 3D object detection, whereas [26] developed a method for point cloud segmentation. However, due to the complex nature of outdoor LiDAR data, Transformer models are yet to establish themselves on outdoor benchmarks.…”

Section: Related Workmentioning

confidence: 99%