In this paper, we propose a novel approach that enables simultaneous localization, mapping (SLAM) and objects recognition using visual sensors data in open environments that is capable to work on sparse data point clouds. In the proposed algorithm the ORB-SLAM uses the current and previous monocular visual sensors video frame to determine observer position and to determine a cloud of points that represent objects in the environment, while the deep neural network uses the current frame to detect and recognize objects (OR). In the next step, the sparse point cloud returned from the SLAM algorithm is compared with the area recognized by the OR network. Because each point from the 3D map has its counterpart in the current frame, therefore the filtration of points matching the area recognized by the OR algorithm is performed. The clustering algorithm determines areas in which points are densely distributed in order to detect spatial positions of objects detected by OR. Then by using principal component analysis (PCA)—based heuristic we estimate bounding boxes of detected objects. The image processing pipeline that uses sparse point clouds generated by SLAM in order to determine positions of objects recognized by deep neural network and mentioned PCA heuristic are main novelties of our solution. In contrary to state-of-the-art approaches, our algorithm does not require any additional calculations like generation of dense point clouds for objects positioning, which highly simplifies the task. We have evaluated our research on large benchmark dataset using various state-of-the-art OR architectures (YOLO, MobileNet, RetinaNet) and clustering algorithms (DBSCAN and OPTICS) obtaining promising results. Both our source codes and evaluation data sets are available for download, so our results can be easily reproduced.
Deep learning-based feature extraction methods and transfer learning have become common approaches in the field of pattern recognition. Deep convolutional neural networks trained using tripled-based loss functions allow for the generation of face embeddings, which can be directly applied to face verification and clustering. Knowledge about the ground truth of face identities might improve the effectiveness of the final classification algorithm; however, it is also possible to use ground truth clusters previously discovered using an unsupervised approach. The aim of this paper is to evaluate the potential improvement of classification results of state-of-the-art supervised classification methods trained with and without ground truth knowledge. In this study, we use two sufficiently large data sets containing more than 200,000 “taken in the wild” images, each with various resolutions, visual quality, and face poses which, in our opinion, guarantee the statistical significance of the results. We examine several clustering and supervised pattern recognition algorithms and find that knowledge about the ground truth has a very small influence on the Fowlkes–Mallows score (FMS) of the classification algorithm. In the case of the classification algorithm that obtained the highest accuracy in our experiment, the FMS improved by only 5.3% (from 0.749 to 0.791) in the first data set and by 6.6% (from 0.652 to 0.718) in the second data set. Our results show that, beside highly secure systems in which face verification is a key component, face identities discovered by unsupervised approaches can be safely used for training supervised classifiers. We also found that the Silhouette Coefficient (SC) of unsupervised clustering is positively correlated with the Adjusted Rand Index, V-measure score, and Fowlkes–Mallows score and, so, we can use the SC as an indicator of clustering performance when the ground truth of face identities is not known. All of these conclusions are important findings for large-scale face verification problems. The reason for this is the fact that skipping the verification of people’s identities before supervised training saves a lot of time and resources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.