Untitled

Towards 3D object tracking in point clouds, a novel point-to-box network termed P2B is proposed in an endto-end learning manner. Our main idea is to first localize potential target centers in 3D search area embedded with target information. Then point-driven 3D target proposal and verification are executed jointly. In this way, the time-consuming 3D exhaustive search can be avoided. Specifically, we first sample seeds from the point clouds in template and search area respectively. Then, we execute permutation-invariant feature augmentation to embed target clues from template into search area seeds and represent them with target-specific features. Consequently, the augmented search area seeds regress the potential target centers via Hough voting. The centers are further strengthened with seed-wise targetness scores. Finally, each center clusters its neighbors to leverage the ensemble power for joint 3D target proposal and verification. We apply PointNet++ as our backbone and experiments on KITTI tracking dataset demonstrate P2B's superiority (∼10%'s improvement over state-of-the-art). Note that P2B can run with 40FPS on a single NVIDIA 1080Ti GPU. Our code and model are available at https://github.com/HaozheQi/P2B.

show abstract

Monocular Relative Depth Perception with Web Stereo Data Supervision

Xian

et al. 2018

View full text Add to dashboard Cite

Structure-Guided Ranking Loss for Single Image Depth Prediction

et al. 2020

View full text Add to dashboard Cite

A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image

Zhang²,

et al. 2019

View full text Add to dashboard Cite

For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed. Within A2J, anchor points able to capture global-local spatial context information are densely set on depth image as local regressors for the joints. They contribute to predict the positions of the joints in ensemble way to enhance generalization ability. The proposed 3D articulated pose estimation paradigm is different from the state-of-the-art encoder-decoder based FCN, 3D CNN and point-set based manners. To discover informative anchor points towards certain joint, anchor proposal procedure is also proposed for A2J. Meanwhile 2D CNN (i.e., ResNet-50) is used as backbone network to drive A2J, without using time-consuming 3D convolutional or deconvolutional layers. The experiments on 3 hand datasets and 2 body datasets verify A2J's superiority. Meanwhile, A2J is of high running speed around 100 FPS on single NVIDIA 1080Ti GPU.

show abstract

A fast and robust local descriptor for 3D point cloud registration

Yang

Cao

Zhang

2016

Information Sciences

181

117

View full text Add to dashboard Cite

TOLDI: An effective and robust approach for 3D local shape description

Yang

Zhang

Xiao

et al. 2017

Pattern Recognition

117

116

View full text Add to dashboard Cite

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

et al. 2019

View full text Add to dashboard Cite

Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i.e., the number of population can vary in [0, +∞) in theory. However, collected data and labeled instances are limited in reality, which means that only a small closed set is observed. Existing methods typically model this task in a regression manner, while they are prone to suffer from an unseen scene with counts out of the scope of the closed set. In fact, counting has an interesting and exclusive property-spatially decomposable. A dense region can always be divided until sub-region counts are within the previously observed closed set. We therefore introduce the idea of spatial divide-and-conquer (S-DC) that transforms open-set counting into a closed-set problem. This idea is implemented by a novel Supervised Spatial Divide-and-Conquer Network (SS-DCNet). Thus, SS-DCNet can only learn from a closed set but generalize well to open-set scenarios via S-DC. SS-DCNet is also efficient. To avoid repeatedly computing sub-region convolutional features, S-DC is executed on the feature map instead of on the input image. We provide theoretical analyses as well as a controlled experiment on toy data, demonstrating why closed-set modeling makes sense. Extensive experiments show that SS-DCNet achieves the state-of-the-art performance on three crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF), a vehicle counting dataset (TRANCOS) and a plant counting dataset (MTC), with a 7.7% relative improvement on the UCF-QNRF, 33.1% on the TRANCOS, and 26.4% on the MTC. SS-DCNet also reports the state-of-the-art cross-domain performance on crowd counting datasets. Particularly in the task from UCF-QNRF to ShanghaiTech Part_A, SS-DCNet even beats most existing models trained directly on the target domain. Code and models have been made available at: https://tinyurl.com/SS-DCNet.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhiguo Cao

An interpretable mortality prediction model for COVID-19 patients

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

Monocular Relative Depth Perception with Web Stereo Data Supervision

Structure-Guided Ranking Loss for Single Image Depth Prediction

A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image

A fast and robust local descriptor for 3D point cloud registration

TOLDI: An effective and robust approach for 3D local shape description

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

Contact Info

Product

Resources

About