Real-time stereo analysis is an important research area in computer vision. In this context, we propose a stereo algorithm for an immersive video-conferencing system by which conferees at different geographical places can meet under similar conditions as in the real world. For this purpose, virtual views of the remote conferees are generated and adapted to the current viewpoint of the local participant. Dense vector fields of high accuracy are required in order to guarantee an adequate quality of the virtual views. Due to the usage of a wide baseline system with strongly convergent camera configurations, the dynamic disparity range is about 150 pixels. Considering computational costs, a full search or even a local search restricted to a small window of a few pixels, as it is implemented in many real-time algorithms, is not suitable for our application because processing on full-resolution video according to CCIR 601 TV standard with 25 frames per second is addressed-the most desirable as a pure software solution running on available processors without any support from dedicated hardware. Therefore, we propose in this paper a new fast algorithm for stereo analysis, which circumvents the window search by using a hybrid recursive matching strategy based on the effective selection of a small number of candidates. However, stereo analysis requires more than a straightforward application of stereo matching. The crucial problem is to produce accurate stereo correspondences in all parts of the image. Especially, errors in occluded regions and homogenous or less structured regions lead to disturbing artifacts in the synthesized virtual views. To cope with this problem, mismatches have to be detected and substituted by a sophisticated interpolation and extrapolation scheme.
Traditional set-top camera video-conferencing systems still fail to meet the 'telepresence challenge' of providing a viable alternative for physical business travel, which is nowadays characterized by unacceptable delays, costs, inconvenience, and an increasingly large ecological footprint. Even recent high-end commercial solutions, while partially removing some of these traditional shortcomings, still present the problems of not scaling easily, expensive implementations, not utilizing 3D life-sized representations of the remote participants and addressing only eye contact and gesture-based interactions in very limited ways. The European FP7 project 3DPresence will develop a multi-party, high-end 3D videoconferencing concept that will tackle the problem of transmitting the feeling of physical presence in real-time to multiple remote locations in a transparent and natural way. In this paper, we present an overall concept, which includes the geometrical design of the whole prototype demonstrator, the arrangement of the cameras and displays and the general multi-view video analysis chain. The driving force behind the design strategy is to fulfil the requirements of a novel 3D immersive videoconferencing system, including directional eye gaze and gesture awareness
The interest in immersive 3D video conference systems exists now for many years from both sides, the commercialization point of view as well as from a research perspective. Still, one of the major bottlenecks in this context is the computational complexity of the required algorithmic modules. This paper discusses this problem from a hardware point of view. We use new fast graphics board solutions, which allow high algorithmic parallelization in consumer PC environments on one hand and look at state-of-the-art powerful multi-core CPU processing capabilities on the other hand. We propose a novel scalable and high performance 3D acquisition framework for immersive 3D videoconference systems which takes benefit from both sides. In this way we are able to integrate complex computer vision algorithms, such as Visual Hull, multi-view stereo matching, segmentation, image rectification, lens distortion correction and virtual view synthesis as well as data encoding, network signaling and capturing for 16 HD cameras in one real-time framework. This paper is based on results and experiences of the European FP7 research project 3D Presence which aims to build a real-time three party and multi-user 3D videoconferencing system
Interoperability, scalability and adaptability are important features for a successful introduction of future 3D TV services. Hence, new concepts must be able to adapt the multi-view geometry of the capturing system to the geometry of the 3D reproduction systems. An approach is discussed, which considers these adaptation issues based on the concept of an N x video-plus-depth data representation. The core algorithms for depth map creation on the analysis side and depth image based rendering on the reproduction side are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.