“…Similarly [31] learned deep features to build codebooks for segmentation in MRI and ultrasiounds images. In [14] the classical Hough algorithm was used to extract circular patterns in car logos, which were then input to a deep classification network. [33] proposed the semiconvolutional operator for 2D instance segmentation in images, which is also related to Hough voting.…”
Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data -samples from 2D manifolds in 3D space -we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.
“…Similarly [31] learned deep features to build codebooks for segmentation in MRI and ultrasiounds images. In [14] the classical Hough algorithm was used to extract circular patterns in car logos, which were then input to a deep classification network. [33] proposed the semiconvolutional operator for 2D instance segmentation in images, which is also related to Hough voting.…”
Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data -samples from 2D manifolds in 3D space -we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.
“…Soon et al [18] presented a method that aimed to automatically search and optimize a CNN architecture for VLR. Huan et al [19] used the Hough transform to achieve accurate vehicle logo detection based on the locations of a vehicle's logo and license plate. Then, vehicle logo classification was performed with deep belief networks (DBNs).…”
Section: B Vlr Methods Based On Deep Learningmentioning
confidence: 99%
“…Step 2 (Optimization): do J sum old = J sum , i=0 while i < M do Obtain the input data P DM i . Calculate A 1 and A 2 using (18) and (19), respectively. Solve (22) to obtain W i+1 0 .…”
In this paper, we present a novel learning-based scheme for vehicle logo recognition (VLR). This scheme is termed Multilayer Pyramid Network Based on Learning (MLPNL) and is based on the principle that considering multiple resolutions is helpful for extracting valuable features that benefit the final recognition performance. The innovations of this scheme include (1) a multilayer pyramid network, with pixel difference matrices (PDMs) as its input and output and feature parameters mapping one PDM to another; (2) an objective function and a corresponding optimization method designed to facilitate the learning of the feature parameters of the proposed multilayer pyramid network; and (3) a multi-codebook-based encoding method that makes best use of the features extracted from PDMs corresponding to different resolutions. Extensive experiments conducted with an open dataset, HFUT-VL, demonstrate that the proposed MLPNL scheme outperforms state-of-the-art handcrafted descriptors and non-deep-learning-based learning methods when fewer training samples exist. Experiments conducted with a benchmark dataset, XMU, demonstrate that MLPNL outperforms existing state-ofthe-art VLR methods. Experiments conducted both on HFUT-VL and XMU demonstrate that MLPNL is faster than most deep-learning-based learning methods while maintaining nearly the same recognition rate. Code has been made available at: https://github.com/HFUT-CV/MLPNL.
“…As a representative example, Leibe et al [39] introduce a Hough-based object segmentation and detection method by incorporating information about supporting patterns of parts for the target category. The idea of Hough voting has widely been adopted in diverse tasks including retrieval [24], object discovery [17,44,48,50], shape recovery [59], 3D vision [34,35], and pose estimation [29] to name a few. In geometric matching, Cho et al [6] first extends it to the Probabilistic Hough Matching (PHM) algorithm for unsupervised object discovery.…”
Despite advances in feature representation, leveraging geometric relations is crucial for establishing reliable visual correspondences under large variations of images. In this work we introduce a Hough transform perspective on convolutional matching and propose an effective geometric matching algorithm, dubbed Convolutional Hough Matching (CHM). The method distributes similarities of candidate matches over a geometric transformation space and evaluate them in a convolutional manner. We cast it into a trainable neural layer with a semi-isotropic high-dimensional kernel, which learns non-rigid matching with a small number of interpretable parameters. To validate the effect, we develop the neural network with CHM layers that perform convolutional matching in the space of translation and scaling. Our method sets a new state of the art on standard benchmarks for semantic visual correspondence, proving its strong robustness to challenging intra-class variations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.