One of the most important functions of the human vision system is the three-dimensional perception of the surrounding environment. The image formed on the retina is actually a reflection of light patterns emitted from the real environment. The two images formed in two eyes differ from each other due to the spatial difference of the two eyes in the horizontal direction, and every point in the three-dimensional space does not appear in the same position in the image of the two retinas, which is the basis of depth perception. These results became the source of extensive research to extract depth by means of two or more images taken from the same scene and from different positions. Creating the appearance of 3D shapes based on photographs by computational techniques is of great interest in the field of machine vision research. Challenges in existing methods such as high error rate and computational costs in these fields have led to new methods in this field. Today, we can create 3D models with images of physical environments and through machine vision methods. Today, machine vision has many applications, including optical character recognition, medical imaging, surveillance, security, and many other things. In this process, the pair of stereo images taken from the real world is used by calculating the disparity of the pixels to estimate the three-dimensional space. In this paper, a framework for creating a disparity map using deep learning is presented, which is presented to solve or reduce the existing problems. Our proposed method is called SIUDL, which is presented using deep learning and based on various criteria such as RMSE errors and 3-pixel error, it has been able to have a lower error rate and better efficiency than methods such as DispNet, GA-Net and PSMNet. The simulation results show that our proposed method has reduced the error rate by 60% compared to the stated methods.