2017
DOI: 10.48550/arxiv.1712.02294
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Joint 3D Proposal Generation and Object Detection from View Aggregation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
43
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 37 publications
(44 citation statements)
references
References 12 publications
0
43
0
Order By: Relevance
“…Currently, two distinct lines are followed. On the one hand, a variety of strategies to perform fusion at the feature level [16], [17], [18] have been introduced. On the other hand, some works divide the process into two steps: performing detection in the image space, and later regressing the 3D box using a subset of the LiDAR modality [6], [19].…”
Section: Related Workmentioning
confidence: 99%
“…Currently, two distinct lines are followed. On the one hand, a variety of strategies to perform fusion at the feature level [16], [17], [18] have been introduced. On the other hand, some works divide the process into two steps: performing detection in the image space, and later regressing the 3D box using a subset of the LiDAR modality [6], [19].…”
Section: Related Workmentioning
confidence: 99%
“…This representation turns out to be extremely attractive since it does not exhibit any of the perspective artifacts introduced in RGB-D images for example, and a major focus of our work is therefore to develop an implicit image-only analogue to these birds-eye-view maps. A further interesting line of research is sensor fusion methods such as AVOD [15] and MV3D [5] which make use of 3D object proposals on the ground plane to aggregate both image-based and birds-eye-view features: an operation which is closely related to our orthographic feature transform.…”
Section: Related Workmentioning
confidence: 99%
“…Integral images Integral images have been fundamentally associated with object detection ever since their introduction in the seminal work of Viola and Jones [32]. They have formed an important component in many contemporary 3D object detection approaches including AVOD [15], MV3D [5], Mono3D [3] and 3DOP [4]. In all of these cases however, integral images do not backpropagate gradients or form part of a fully end-to-end deep learning architecture.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, a single camera is naturally inaccurate in 3D localization. There are also other works exploring the use of specific depth sensor such as stereo imagery [8], [9], [10], [11], which are also relatively low-cost and provide effective depth information, but have a limited sensing range; and LiDAR [12], [13], [14], [15], [16], [17], [18], which has accurate 3D localization ability, but is less informative and sensitive to reflection (e.g., rainy, car window). To achieve robust perception, modern self-driving vehicles tend to equip multiple different sensors, where the 3D information is represented in quite different ways (e.g., high-level semantic cues for the monocular image, pixel-level disparity for stereo images, sparse but geometric-aware point cloud for LiDARs).…”
Section: Introductionmentioning
confidence: 99%