Abstract-We present a novel framework for the analysis and optimization of encoding latency for multiview video. First, we characterize the elements that have an influence in the encoding latency performance: 1) the multiview prediction structure and 2) the hardware encoder model. Then, we provide algorithms to find the encoding latency of any arbitrary multiview prediction structure. The proposed framework relies on the directed acyclic graph encoder latency (DAGEL) model, which provides an abstraction of the processing capacity of the encoder by considering an unbounded number of processors. Using graph theoretic algorithms, the DAGEL model allows us to compute the encoding latency of a given prediction structure, and determine the contribution of the prediction dependencies to it. As an example of DAGEL application, we propose an algorithm to reduce the encoding latency of a given multiview prediction structure up to a target value. In our approach, a minimum number of frame dependencies are pruned, until the latency target value is achieved, thus minimizing the degradation of the rate-distortion performance due to the removal of the prediction dependencies. Finally, we analyze the latency performance of the DAGEL derived prediction structures in multiview encoders with limited processing capacity.Index Terms-Free viewpoint video, low latency, multiview coding, prediction structures, three-dimensional video (3DV), video-conference.
360° video, supporting the ability to present views consistent with the rotation of the viewer's head along three axes (roll, pitch, yaw) is the current approach for creation of immersive video experiences. Nevertheless, a more fully natural, photorealistic experience-with support of visual cues that facilitate coherent psycho-visual sensory fusion without the side-effect of cyber-sickness-is desired. 360° video applications that additionally enable the user to translate in x, y, and z directions are clearly a subsequent frontier to be realized toward the goal of sensory fusion without cybersickness. Such support of full Six Degrees-of-Freedom (6 DoF) for next generation immersive video is a natural application for light fields. However, a significant obstacle to the adoption of light field technologies is the large data necessary to ensure that the light rays corresponding to the viewer's position relative to 6-DoF are properly delivered, either from captured light information or synthesized from available views. Experiments to improve known methods for view synthesis and depth estimation are therefore a fundamental next step to establish a reference framework within which compression technologies can be evaluated. This paper describes a testbed and experiments to enable smooth and artefact-free view transitions that can later be used in a framework to study how best to compress the data.
Abstract-We present preliminary experiments on subjective evaluation of Super Multiview Video (SMV) in stereoscopic and auto-stereoscopic displays. SMV displays require a large number of views (typically 80 or more), but are not yet widely available. Subjective evaluation in legacy displays, though not optimal, will therefore be necessary for the development SMV video technologies. This has lead us to perform standardized subjective evaluation of uncompressed SMV test sequences, simulating SMV displays through view sweep, which is controlled by three parameters: View-Sweep Speed (VSS), Viewing Range, and View Density (VD). In our analysis we have identified ranges of most comfortable values of VSS and VD, providing a comfortable view sweep with smooth transition between views.
Nowadays, pedestrian detection is one of the pivotal fields in computer vision, especially when performed over video surveillance scenarios. People detection methods are highly sensitive to occlusions among pedestrians, which dramatically degrades performance in crowded scenarios. The cutback in camera prices has allowed generalizing multi-camera set-ups, which can better confront occlusions by using different points of view to disambiguate detections. In this paper we present an approach to improve the performance of these multi-camera systems and to make them independent of the considered scenario, via an automatic understanding of the scene content. This semantic information, obtained from a semantic segmentation, is used 1) to automatically generate a common Area of Interest for all cameras, instead of the usual manual definition of this area; and 2) to improve the 2D detections of each camera via an optimization technique which maximizes coherence of every detection both in all 2D views and in the 3D world, obtaining best-fitted bounding boxes and a consensus height for every pedestrian. Experimental results on five publicly available datasets show that the proposed approach, which does not require any training stage, outperforms state-of-the-art multi-camera pedestrian detectors non specifically trained for these datasets, which demonstrates the expected semantic-based robustness to different scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.