The three-dimensional (3D)
Introduction3D video signal processing has become a trend in the visual processing field. As 3D display technology matures, people aspire to experience the videos that closer to reality. Emerging 3D displays generate better stereo effects than conventional 2D displays. However, for existing 2D content, depth information is not recorded. The lack of an effective 3D content generation approach is a dilemma for 3D industry. Therefore, depth-aware video processing becomes an important issue.The main concept of depth generation is to retrieve the third dimensional information from multiple depth cues of existing 2D video content. When watching real world, human brain integrates various heuristic depth cues to generate depth perception. . Therefore, in order to create high quality depth map from single view video, both binocular and monocular cues need to be explored. In this paper, we propose a depth-aware video processing system. The proposed system can provides depth map and multi-view video for 3D/stereoscopic displays and also can enhance the depth perception on conventional 2D displays.
Proposed Video Processing SystemThe system we proposed has three major cores, depth generation, depth-aware 2D video enhancement, and multi-view depth imagebased rendering (DIBR) [7], as shown in Figure 1. The detail of each part is explained in the following sub-sections.
Depth GenerationThe depth generation module is the major part in the system. The module combines multiple depth cues. They are depth from motion parallax (DMP), depth from geometrical perspective (DGP), and depth from relative position (DRP). These cues are integrated by priority depth fusion method as shown in Figure 2.In binocular depth cue, multiple frames in temporal domain are used to approximate two views in spatial domain [5]. Firstly, the camera motion of the consecutive frames with respect to the current frame is analyzed by global motion estimation. The frame with most suitable baseline is selected and is warped to form a parallel view configuration with the current frame to compensate the camera motion.Therefore, the warped frame and the current frame can be approximated as binocular images. The disparity in spatial domain can be seen as motion parallax in temporal domain. Motion parallax is computed by block-based matching with spatial smoothness consideration. The spatial smoothness is applied to generate smooth motion vector using local optimization method.