“…Our method consists of three modules: multi-view preprocessing, an adaptive encoder-decoder network, postprocessing with shared occlusion masks, as shown in Figure 2. Inspired by present learning-based multi-view methods, such as [1,3], which use front-to-parallel planes at different depth as hypothesis planes, the first step of our pre-processing is to adopt homography to warp source images and then construct cost volumes, as shown in Figure 2. Given input sequences in the form of image-pose pairs, each of which contains a reference image I r for depth estimation, several additional source images {I i } N i=1 , and camera parameters.…”