In order to improve the predication accuracy with low execution time in the process of image depth map generation, we mainly investigate the unsupervised monocular image depth prediction. In this paper, an unsupervised monocular image depth prediction method based on multiple loss deep learning is designed from following two aspects. First, a monocular image depth estimation algorithm based on multi-scale feature extraction is proposed, which includes two parts: a feature extraction network and a deconvolution prediction network. The feature extraction network extracts image features at different levels of the network and introduces the acquired multi-scale features into the deconvolution layer, without changing the image resolution. Through training, the left and right disparity map can be eventually predicted. Second, we provide a new multiple loss function with the asymmetric parameters of the training model and constraint theorem of polar geometry. The Multi-Scale-Structural Similarity Index (MS-SSIM) algorithm and L1 algorithm are combined as the loss function of image reconstruction, the left-right disparity consistency and the flipped left-right disparity consistency are incorporated in the loss function of the network model training. The simulation results show that this method can effectively improve the prediction results accuracy, particularly for complex images with mirrors, transparent, and shadows. KITTI dataset is further utilized to evaluate our method, which can achieve end-to-end results that even exceed those of a supervised method.INDEX TERMS Depth estimation, convolutional neural network, unsupervised, feature extraction.
Aiming at the problems of high cost and low accuracy of scene details during the depth map generation in 3D reconstruction, we propose an unsupervised monocular image depth prediction algorithm based on Fourier domain analysis. Generally speaking, smallscale images can better display depth details, while large-scale images can more reliably display the depth distribution value of the entire image. In order to take advantage of these complementary properties, we crop the input image with different cropped image ratios to generate multiple disparity map candidates, and then use Fourier frequency domain analysis algorithms to fuse disparity mapping candidates into left and right disparity maps. At the same time, we propose a loss function based on MSSIM to compensate the difference between left and right views and realize unsupervised monocular image depth prediction model training. Experimental results show that our method has good performance on the KITTI dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.