In 6 Degrees of Freedom (6DoF) Video, it provides 360 video considering both user's head and body movement. In IEEE 1857.9 6DoF video expression, synthesizing the virtual view in accordance with the user's movement is required. During the view synthesis process, the reference software provided by IEEE 1857.9 requires a depth map as auxiliary information. In order to reuse existing video coding standards and quickly apply them to the industry, IEEE 1857.9 provides four depth map compression interfaces for AVC, HEVC, AVS2, and AVS3(First stage HPM4.0). In addition, the depth information is never directly shown to the user. Thus, new comparison methods have to be developed that are adapted to the depth maps compression. In this paper, we investigate the depth video compression performance of the software encoders of H.264/AVC, H.265/HEVC, and AVS3 respectively. To ensure fair analysis, we evaluate the performance of the encoder by comparing the objective quality of the virtual synthetic view. We use the reference software of IEEE 1857.9 for synthesizing the virtual view. Regarding coding performance evaluation, PSNR and SSIM are used as objective quality metrics. According to the experimental results, which were obtained by using similar configurations for all examined representative encoders, the AVS3 reference software implementation provides significant average bit-rate savings of 63% and 19% compared to H.264/AVC and H.265/HEVC respectively.
In this paper, we propose a learning-based multi-view stereo network with the proposed feature correlation aggregation network (FCANet). We notice that the source views used to infer the depth of reference view are quite different, which are reflected in the images. Therefore, the contribution of source views should be different for building cost volume, which depends on the similarity between the source and reference views in our opinion. To this end, we propose FCANet infer the similarity to guide the cost aggregation. In addition, we adopt the strategy to build cost volume and infer depth in coarse to fine. We evaluate the proposed FCA-MVSNet and conduct ablation studies for the proposed FCANet on DTU dataset. The results show that we can significantly outperform the baseline and achieve state-of-the-art results, especially the reconstruction completeness has broken through 0.3mm of mean distance metric. Moreover, the proposed FCANet can significantly improve the reconstruction quality compared with the widely used variance metric.
The existing learning-based multi-view stereo (MVS) approaches achieve impressive results compared with traditional methods. However, most of them rely on ground-truth 3D data as supervision, and the acquisition of high-quality ground truth for various scenes is a challenging problem. In this paper, we propose a novel real-time unsupervised multiview depth estimation network for virtual view synthesis tasks and take multi-view images as supervision. To improve the completeness and accuracy of the virtual viewpoint, we propose a novel shared occlusion mask to deal with the artifacts caused by occlusion in the reconstructed image, and filter out the unreliable points in the depth map. Besides, we also design a mask-based photometric loss to guide our network to generate more reasonable masks and high-quality depth maps. Experimental results on the IEEE1857.9 virtual viewpoint synthesis dataset demonstrate that our proposed method outperforms other recent MVS methods and achieves more excellent real-time performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.