Shiyi Lan scite author profile

Recent advances in deep convolutional neural networks (CNNs) have motivated researchers to adapt CNNs to directly model points in 3D point clouds. Modeling local structure has been proven to be important for the success of convolutional architectures, and researchers exploited the modeling of local point sets in the feature extraction hierarchy. However, limited attention has been paid to explicitly model the geometric structure amongst points in a local region. To address this problem, we propose Geo-CNN, which applies a generic convolution-like operation dubbed as GeoConv to each point and its local neighborhood. Local geometric relationships among points are captured when extracting edge features between the center and its neighboring points. We first decompose the edge feature extraction process onto three orthogonal bases, and then aggregate the extracted features based on the angles between the edge vector and the bases. This encourages the network to preserve the geometric structure in Euclidean space throughout the feature extraction hierarchy. GeoConv is a generic and efficient operation that can be easily integrated into 3D point cloud analysis pipelines for multiple applications. We evaluate Geo-CNN on ModelNet40 and KITTI and achieve state-of-the-art performance.

show abstract

FastMask: Segment Multi-scale Object Candidates in One Shot

Lan

Jiang

et al. 2017

View full text Add to dashboard Cite

Objects appear to scale differently in natural images. This fact requires methods dealing with object-centric tasks (e.g. object proposal) to have robust performance over variances in object scales. In the paper, we present a novel segment proposal framework, namely FastMask, which takes advantage of hierarchical features in deep convolutional neural networks to segment multi-scale objects in one shot. Innovatively, we adapt segment proposal network into three different functional components (body, neck and head). We further propose a weight-shared residual neck module as well as a scale-tolerant attentional head module for efficient one-shot inference. On MS COCO benchmark, the proposed FastMask outperforms all state-of-the-art segment proposal methods in average recall being 2∼5 times faster. Moreover, with a slight trade-off in accuracy, Fast-Mask can segment objects in near real time (∼13 fps) with 800×600 resolution images, demonstrating its potential in practical applications. Our implementation is available on

show abstract

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Lan

Yu²,

Choy³

et al. 2021

View full text Add to dashboard Cite

We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose a self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision. The teacher is a structured energy model incorporating a pairwise potential and a cross-image potential to model the pairwise pixel relationships both within and across the boxes. Minimizing the teacher energy simultaneously yields refined object masks and dense correspondences between intra-class objects, which are taken as pseudo-labels to supervise the task network and provide positive/negative correspondence pairs for dense constrastive learning. We show a symbiotic relationship where the two tasks mutually benefit from each other. Our best model achieves 37.9% AP on COCO instance segmentation, surpassing prior weakly supervised methods and is competitive to supervised methods. We also obtain state of the art weakly supervised results on PASCAL VOC12 and PF-PASCAL with real-time inference.

show abstract

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

Meng

Chen³

et al. 2022

View full text Add to dashboard Cite

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

Guan

Wang

Lan

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shiyi Lan

Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN

FastMask: Segment Multi-scale Object Candidates in One Shot

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

Contact Info

Product

Resources

About