2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00752
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Task Multi-Sensor Fusion for 3D Object Detection

Abstract: In this paper we propose to exploit multiple related tasks for accurate multi-sensor 3D object detection. Towards this goal we present an end-to-end learnable architecture that reasons about 2D and 3D object detection as well as ground estimation and depth completion. Our experiments show that all these tasks are complementary and help the network learn better representations by fusing information at various levels. Importantly, our approach leads the KITTI benchmark on 2D, 3D and BEV object detection, while b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
338
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 588 publications
(339 citation statements)
references
References 29 publications
1
338
0
Order By: Relevance
“…Autonomous Driving Datasets with Trajectory Data. ApolloScape [26] also uses sensor-equipped vehicles to observe driving trajectories in the wild and presents a forecasting benchmark [41] [37] show how 3D object detection accuracy can be improved by using mapping (ground height estimation) as an additional task in multi-task learning. Suraj et al [40] use dashboardmounted monocular cameras on a fleet of vehicles to build a 3D map via city-scale structure-from-motion for localization of ego-vehicles and trajectory extraction.…”
Section: Related Workmentioning
confidence: 99%
“…Autonomous Driving Datasets with Trajectory Data. ApolloScape [26] also uses sensor-equipped vehicles to observe driving trajectories in the wild and presents a forecasting benchmark [41] [37] show how 3D object detection accuracy can be improved by using mapping (ground height estimation) as an additional task in multi-task learning. Suraj et al [40] use dashboardmounted monocular cameras on a fleet of vehicles to build a 3D map via city-scale structure-from-motion for localization of ego-vehicles and trajectory extraction.…”
Section: Related Workmentioning
confidence: 99%
“…F-PointNet [20] extends the 2D detections from image into corresponding frustums in the 3D space. MMF [13] is proposed to exploit multiple related tasks including depth completion and 2D object detection for accurate multi-sensor 3D object detection. However, although multiple sensors could provide extra information, the inference efficiency for these frameworks are relatively low.…”
Section: D Object Detectionmentioning
confidence: 99%
“…Multi-scale strategy [14,13,2] has been proved to be effective to 3D object detection. ContFuse [14] uses continuous convolution to aggregate multi-scale feature maps from different ResNet Block [7].…”
Section: Multi-scale Feature Aggregationmentioning
confidence: 99%
See 1 more Smart Citation
“…Apart from [84], [155], the methods which directly detect 3D BBs use stereo images [92], RGB-D cameras [86], [93], [94], [95], [152], and LIDAR sensors [152] as input. There are also several methods fusing multiple sensor inputs, and generating 3D BB hypotheses [107]. Without depending on the input (RGB (Mo/St), RGB-D, or LIDAR), 3D BB detection methods produce oriented 3D BBs, which are parameterized with center x = (x, y, z), size d = (d w , d h , d l ), and orientation (θ y ) around the gravity direction.…”
Section: Introductionmentioning
confidence: 99%