2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00369
|View full text |Cite
|
Sign up to set email alerts
|

VENet: Voting Enhancement Network for 3D Object Detection

Abstract: Hough voting, as has been demonstrated in VoteNet, is effective for 3D object detection, where voting is a key step. In this paper, we propose a novel VoteNet-based 3D detector with vote enhancement to improve the detection accuracy in cluttered indoor scenes. It addresses the limitations of current voting schemes, i.e., votes from neighboring objects and background have significant negative impacts. Before voting, we replace the classic MLP with the proposed Attentive MLP (AMLP) in the backbone network to get… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 40 publications
(11 citation statements)
references
References 45 publications
(42 reference statements)
0
10
0
Order By: Relevance
“…The results are summarized in Table 2. With the same backbone network of a standard PointNet++, our approach achieves 70.2 mAP@0.25 and 54.2 mAP@0.5 using 66 rays and 256 object candidates, which is 2.5 and 3.3 better than previous best methods [42,7] using the same backbones. With stronger backbones and more sampled object candidates just like [21], i.e., 2× more channels and 512 candidates, our approach is also improved dramatically, achieving 70.6 mAP@0.25 and 55.2 mAP@0.5, which is still 1.5 and 2.4 better than [21].…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 93%
“…The results are summarized in Table 2. With the same backbone network of a standard PointNet++, our approach achieves 70.2 mAP@0.25 and 54.2 mAP@0.5 using 66 rays and 256 object candidates, which is 2.5 and 3.3 better than previous best methods [42,7] using the same backbones. With stronger backbones and more sampled object candidates just like [21], i.e., 2× more channels and 512 candidates, our approach is also improved dramatically, achieving 70.6 mAP@0.25 and 55.2 mAP@0.5, which is still 1.5 and 2.4 better than [21].…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 93%
“…To evaluate point-based GHA for object detection, we experiment on ScanNet detection dataset [11], which contains 1,513 indoor scans with annotated bounding boxes, split into 1,201 scenes for training and 312 for validation. Following prior works [36,6,70,59], we report the mean average precision, specifically mAP 25 and mAP 50 , computed at a 0.25 and 0.5 IoU threshold respectively. Setting.…”
Section: Point-gha: 3d Object Detectionmentioning
confidence: 99%
“…Another technical branch is point-based methods. Point clouds have emerged as a great powerful representation for 3D deep learning tasks, such as classification [15][16][17][18][19][20], semantic segmentation [21][22][23], point cloud normal estimation [24], 3D reconstruction [25][26][27], and 3D object detection [28][29][30][31]. Most of these works adopt raw point clouds to extract expressive representations based on pioneering work PointNet/PointNet++ [15,16].…”
Section: Introductionmentioning
confidence: 99%