Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, survey study about the latest development of deep learning-based methods is lacking. Therefore, this study presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this study is limited to methods taking monocular RGB/RGBD data as input and covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods of both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.
Simultaneous Localization and Mapping (SLAM) and Autonomous Driving are becoming increasingly more important in recent years. Point cloud-based large scale place recognition is the spine of them. While many models have been proposed and have achieved acceptable performance by learning short-range local features, they always skip long-range contextual properties. Moreover, the model size also becomes a serious shackle for their wide applications. To overcome these challenges, we propose a super light-weight network model termed SVT-Net. On top of the highly efficient 3D Sparse Convolution (SP-Conv), an Atom-based Sparse Voxel Transformer (ASVT) and a Cluster-based Sparse Voxel Transformer (CSVT) are proposed respectively to learn both short-range local features and long-range contextual features. Consisting of ASVT and CSVT, SVT-Net can achieve state-of-the-art performance in terms of both recognition accuracy and running speed with a super-light model size (0.9M parameters). Meanwhile, for the purpose of further boosting efficiency, we introduce two simplified versions, which also achieve state-of-the-art performance and further reduce the model size to 0.8M and 0.4M respectively.
Point cloud based 3D scene recognition is fundamental to many real world applications such as Simultaneous Localization and Mapping (SLAM). However, most of existing methods do not take full advantage of the contextual semantic features of scenes. And their recognition abilities are severely affected by dynamic noise such as points of cars and pedestrians in the scene. To tackle these issues, we propose a new Scene Recognition Network, namely SRNet. In this model, to learn local features without being affected by dynamic noise, we propose Static Graph Convolution (SGC) module, which are then stacked as our backbone. Next, to further avoid dynamic noise, we introduce a Spatial Attention Module (SAM) to make the feature descriptor pay more attention to immovable local areas that are more relevant to our task. Finally, in order to make a more profound sense of the scene, we design a Dense Semantic Fusion (DSF) strategy to integrate multi-level features during feature propagation, which helps the model deepen its understanding of the contextual semantics of scenes. By utilizing these designs, SRNet can map scenes to discriminative and generalizable feature vectors, which are then used for finding matching pairs. Experimental studies demonstrate that SRNet achieves new state-of-the-art on scene recognition and shows good generalization ability to other point cloud based tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.