REGION-OF-INTEREST 3D VIDEO CODING BASED ON DEPTH IMAGES
L. S. Karlsson, M. SjöströmMid Sweden University Department of Information Technology and Media SE-851 70 Sundsvall, Sweden
ABSTRACTThree dimensional (3D) TV is becoming a mature technology due to the progress within areas such as display and network technology among others. However, 3D video demands a higher bandwidth in order to transmit the information needed to render or directly display several different views at the receiver. The 2D plus depth representation requires less bit rate than most 3D video representations, although the necessary views have to be rendered at the receiver. In this paper we propose to combine the 2D plus depth representation with region-of-interest (ROI) video coding to ensure a higher quality at parts of the sequence that are of interest to the viewer. These include objects close to the viewer as well as faces. This allows either the bit rate to be reduced by 12-28 % or the quality within the ROI to be increased by 0.57 -1.5 dB, when a fixed bit rate is applied.
Common autostereoscopic 3D displays are based on multi-view projection. The diversity of resolutions and number of views of such displays implies a necessary flexibility of 3D content formats in order to make broadcasting efficient. Furthermore, distribution of content over a heterogeneous network should adapt to an available network capacity. Present scalable video coding provides the ability to adapt to network conditions; it allows for quality, temporal and spatial scaling of 2D video. Scalability for 3D data extends this list to the depth and the view domains. We have introduced scalability with respect to depth information. Our proposed scheme is based on the multi-view-plus-depth format where the center view data are preserved, and side views are extracted in enhancement layers depending on depth values. We investigate the performance of various layer assignment strategies: number of layers, and distribution of layers in depth, either based on equal number of pixels or histogram characteristics. We further consider the consequences to variable distortion due to encoder parameters. The results are evaluated considering their overall distortion verses bit rate, distortion per enhancement layer, as well as visual quality appearance. Scalability with respect to depth (and views) allows for an increased number of quality steps; the cost is a slight increase of required capacity for the whole sequence. The main advantage is, however, an improved quality for objects close to the viewer, even if overall quality is worse.
The present work analyses a layered depth-image-basedrendering algorithm based on possible errors occurring with perspective 3D warping. The outcome is improvements to the algorithm that treats depth reliably for scenes containing several levels of foreground objects. The filling of holes of different kinds is addressed so that results have better visual quality. The analysis compares the results of the algorithm with a reference algorithm for the potential error types, and visual examples exhibit the consequences of the improvements. Different objective metrics give ambiguous results, which may be explained by the reduction of structure caused by the reference algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.