PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval

Zhang, Wenxiao; Xiao, Chunxia

doi:10.1109/cvpr.2019.01272

Cited by 201 publications

(143 citation statements)

References 40 publications

Supporting

Mentioning

134

Contrasting

Order By: Relevance

“…Inspired by the self-attention idea in natural language processing [40], recent works connect the self-attention mechanism with contextual information mining to improve scene understanding tasks such as image recognition [41], semantic segmentation [11] and point cloud recognition [42]. As to 3D point data processing, the work in [14] proposes to utilize the attention network to capture the contextual information in 3D points. Specifically, it presents a point contextual attention network to encode local features into a global descriptor for point cloud based retrieval.…”

Section: Contextual Informationmentioning

confidence: 99%

“…To model the contextual information, three sub-modules are proposed in the framework, i.e., patch-to-patch context (PPC) module, object-to-object context (OOC) module and the global scene context (GSC) module. In particular, similar to [14], we use the self-attention mechanism to model the relationships between elements in both PPC and OOC modules. These two sub-modules aim at adaptively encoding contextual information at the patch and object levels, respectively.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

Xie

Lai

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

151

154

View full text Add to dashboard Cite

In this paper, we address the 3D object detection task by capturing multi-level contextual information with the selfattention mechanism and multi-scale feature fusion. Most existing 3D object detection methods recognize objects individually, without giving any consideration on contextual information between these objects. Comparatively, we propose Multi-Level Context VoteNet (MLCVNet) to recognize 3D objects correlatively, building on the state-of-the-art VoteNet. We introduce three context modules into the voting and classifying stages of VoteNet to encode contextual information at different levels. Specifically, a Patch-to-Patch Context (PPC) module is employed to capture contextual information between the point patches, before voting for their corresponding object centroid points. Subsequently, an Object-to-Object Context (OOC) module is incorporated before the proposal and classification stage, to capture the contextual information between object candidates. Finally, a Global Scene Context (GSC) module is designed to learn the global scene context. We demonstrate these by capturing contextual information at patch, object and scene levels. Our method is an effective way to promote detection accuracy, achieving new state-of-the-art detection performance on challenging 3D object detection datasets, i.e., SUN RGBD and ScanNet. We also release our code at https://github.com/NUAAXQ/MLCVNet.

show abstract

Section: Contextual Informationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

Xie

Lai

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

151

154

View full text Add to dashboard Cite

show abstract

“…However, the effects of local area and dynamic noise are not taken into account in PointNetVLAD. PCAN [ZX19] proposes a Point Contextual Attention Network to PointNetVLAD, thus making the model pay more attention to more task-related areas. Through this special attention mechanism, their model can extract local features to some extent.…”

Section: Scene Recognition Methodsmentioning

confidence: 99%

“…Our models are compared with recent state-of-the-art methods: PointNetVALD [AUHL18], PCAN [ZX19], DAGC [SLH * 20] and LPD-Net [LZS * 19]. Performance is evaluated by average recall at top 1% and average recall at top 1.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

“…Our goal is to learn a score matrix of size N ×C, where C is 1 in this paper representing one score for one point, to re-weigh features learned by the lth layer. Note that in [ZX19] and [SLH * 20], they also propose attention mechanisms to increase robustness of their proposed models. However, their purpose of using attention is not suppressing impact of dynamic noise but learning powerful local features.…”

Section: Spatial Attention Modulementioning

confidence: 99%

See 1 more Smart Citation

SRNet: A 3D Scene Recognition Network using Static Graph and Dense Semantic Fusion

Fan

Liu

et al. 2020

Computer Graphics Forum

View full text Add to dashboard Cite

Point cloud based 3D scene recognition is fundamental to many real world applications such as Simultaneous Localization and Mapping (SLAM). However, most of existing methods do not take full advantage of the contextual semantic features of scenes. And their recognition abilities are severely affected by dynamic noise such as points of cars and pedestrians in the scene. To tackle these issues, we propose a new Scene Recognition Network, namely SRNet. In this model, to learn local features without being affected by dynamic noise, we propose Static Graph Convolution (SGC) module, which are then stacked as our backbone. Next, to further avoid dynamic noise, we introduce a Spatial Attention Module (SAM) to make the feature descriptor pay more attention to immovable local areas that are more relevant to our task. Finally, in order to make a more profound sense of the scene, we design a Dense Semantic Fusion (DSF) strategy to integrate multi-level features during feature propagation, which helps the model deepen its understanding of the contextual semantics of scenes. By utilizing these designs, SRNet can map scenes to discriminative and generalizable feature vectors, which are then used for finding matching pairs. Experimental studies demonstrate that SRNet achieves new state-of-the-art on scene recognition and shows good generalization ability to other point cloud based tasks.

show abstract

PVT: Point‐voxel transformer for point cloud learning

Zhong-yu

Wan

Shen

et al. 2022

Int J of Intelligent Sys

View full text Add to dashboard Cite

The recently developed pure transformer architectures have attained promising accuracy on point cloud learning benchmarks compared to convolutional neural networks. However, existing point cloud Transformers are computationally expensive because they waste a significant amount of time on structuring irregular data. To solve this shortcoming, we present the Sparse Window Attention module to gather coarse‐grained local features from nonempty voxels. The module not only bypasses the expensive irregular data structuring and invalid empty voxel computation, but also obtains linear computational complexity with respect to voxel resolution. Meanwhile, we leverage two different self‐attention variants to gather fine‐grained features about the global shape according to different scale of point clouds. Finally, we construct our neural architecture called point‐voxel transformer (PVT), which integrates these modules into a joint framework for point cloud learning. Compared with previous transformer‐based and attention‐based models, our method attains a top accuracy of 94.1% on the classification benchmark and 10-0.25em× $10\times $ inference speedup on average. Extensive experiments also validate the effectiveness of PVT on semantic segmentation benchmarks. Our code and pretrained model are avaliable at https://github.com/HaochengWan/PVT.

show abstract

PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval

Cited by 201 publications

References 40 publications

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

SRNet: A 3D Scene Recognition Network using Static Graph and Dense Semantic Fusion

PVT: Point‐voxel transformer for point cloud learning

Contact Info

Product

Resources

About