Real-time detection and acquisition of localization information of instance targets in real three-dimensional space plays an important role in application scenarios such as virtual reality simulation and digital twinning. The existing spatial localization methods without the aid of lidar and other equipment often have problems in restoring the real scale. In order to overcome this problem and achieve more accurate object spatial localization, an object spatial localization by fusing 3D point clouds and instance segmentation is proposed. This method obtains sparse 3D point cloud data by binocular stereo matching, which is used to describe the real scale and spatial location information of the object. Then uses deep learning method to perform monocular instance segmentation on the specific category target of interest, and the segmentation result is used as the front/background prior information to complete the coordinate correction and densification of the 3D point cloud data inside and outside the object contour. Compared with the unsupervised depth estimation methods based on deep learning, our method can quickly and accurately achieve the three-dimensional precise localization of the instance target and its various components in real-world scenes, and the accuracy in the indoor scene is more than 90%.
In this paper, a self-propagating video segmentation approach based on patch matching and enhanced Onecut is proposed, which takes full advantage of the target's color feature, shape feature, and motion information. Firstly, an interactive key frame segmentation is performed to obtain the accurate initial contour of target. Secondly, some sampling patches are uniformly selected along the target contour of previous frame to initialize the localized classifiers. Afterwards, the patch matching is used to pass this contour to the current frame. Simultaneously, the localized classifiers are moved to current frame similarly, and their corresponding positions and parameters are also updated. Eventually, the foreground and background probability maps of current frame are calculated through the localized classifiers as well as global probability models, and then the enhanced Onecut model is constructed to obtain its segmentation result. Compared with the state-of-the-art video segmentation methods, our proposed approach performs outstandingly on the DAVIS dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.