Recent work on depth estimation up to now has only focused on projective images ignoring 360 o content which is now increasingly and more easily produced. We show that monocular depth estimation models trained on traditional images produce sub-optimal results on omnidirectional images, showcasing the need for training directly on 360 o datasets, which however, are hard to acquire. In this work, we circumvent the challenges associated with acquiring high quality 360 o datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360 o via rendering. This dataset, which is considerably larger than similar projective datasets, is publicly offered to the community to enable future research in this direction. We use this dataset to learn in an end-to-end fashion the task of depth estimation from 360 o images. We show promising results in our synthesized data as well as in unseen realistic images.
Learning based approaches for depth perception are limited by the availability of clean training data. This has led to the utilization of view synthesis as an indirect objective for learning depth estimation using efficient data acquisition procedures. Nonetheless, most research focuses on pinhole based monocular vision, with scarce works presenting results for omnidirectional input. In this work, we explore spherical view synthesis for learning monocular 360 o depth in a self-supervised manner and demonstrate its feasibility. Under a purely geometrically derived formulation we present results for horizontal and vertical baselines, as well as for the trinocular case. Further, we show how to better exploit the expressiveness of traditional CNNs when applied to the equirectangular domain in an efficient manner. Finally, given the availability of ground truth depth data, our work is uniquely positioned to compare view synthesis against direct supervision in a consistent and fair manner. The results indicate that alternative research directions might be better suited to enable higher quality depth perception. Our data, models and code are publicly available at https
Abstract-The latest developments in 3D capturing, processing, and rendering provide means to unlock novel 3D application pathways. The main elements of an integrated platform, which target tele-immersion and future 3D applications, are described in this paper, addressing the tasks of real-time capturing, robust 3D human shape/appearance reconstruction, and skeletonbased motion tracking. More specifically, initially, the details of a multiple RGB-depth (RGB-D) capturing system are given, along with a novel sensors' calibration method. A robust, fast reconstruction method from multiple RGB-D streams is then proposed, based on an enhanced variation of the volumetric Fourier transform-based method, parallelized on the Graphics Processing Unit, and accompanied with an appropriate texturemapping algorithm. On top of that, given the lack of relevant objective evaluation methods, a novel framework is proposed for the quantitative evaluation of real-time 3D reconstruction systems. Finally, a generic, multiple depth stream-based method for accurate real-time human skeleton tracking is proposed. Detailed experimental results with multi-Kinect2 data sets verify the validity of our arguments and the effectiveness of the proposed system and methodologies.
Usage of Unmanned Aerial Vehicles (UAVs) is growing rapidly in a wide range of consumer applications, as they prove to be both autonomous and flexible in a variety of environments and tasks. However, this versatility and ease of use also brings a rapid evolution of threats by malicious actors that can use UAVs for criminal activities, converting them to passive or active threats. The need to protect critical infrastructures and important events from such threats has brought advances in counter UAV (c-UAV) applications. Nowadays, c-UAV applications offer systems that comprise a multi-sensory arsenal often including electro-optical, thermal, acoustic, radar and radio frequency sensors, whose information can be fused to increase the confidence of threat’s identification. Nevertheless, real-time surveillance is a cumbersome process, but it is absolutely essential to detect promptly the occurrence of adverse events or conditions. To that end, many challenging tasks arise such as object detection, classification, multi-object tracking and multi-sensor information fusion. In recent years, researchers have utilized deep learning based methodologies to tackle these tasks for generic objects and made noteworthy progress, yet applying deep learning for UAV detection and classification is considered a novel concept. Therefore, the need to present a complete overview of deep learning technologies applied to c-UAV related tasks on multi-sensor data has emerged. The aim of this paper is to describe deep learning advances on c-UAV related tasks when applied to data originating from many different sensors as well as multi-sensor information fusion. This survey may help in making recommendations and improvements of c-UAV applications for the future.
R ecent film releases such as Avatar have revolutionized cinema by combining 3D technology and content production and real actors, leading to the creation of a new genre at the outset of the 2010s. The success of 3D cinema has led several major consumer electronics manufacturers to launch 3D-capable televisions and broadcasters to offer 3D content. Today's 3DTV technology is based on stereo vision, which presents left-and right-eye images through temporal or spatial multiplexing to viewers wearing a pair of glasses. The next step in 3DTV development will likely be a multiview autostereoscopic imaging system, which will record and present many pairs of video signals on a display and will not require viewers to wear glasses. 1,2 Although researchers have proposed several autostereoscopic displays, the resolution and viewing position is still limited. Furthermore, stereo and multiview technologies rely on the brain to fuse the two disparate images to create the 3D effect. As a result, such systems tend to cause eye strain, fatigue, and headaches after prolonged viewing because users are required to focus on the screen plane (accommodation) but to converge their eyes to a point in space in a different plane (convergence), producing unnatural viewing. Recent advances in digital technology have eliminated some of these human factors, but some intrinsic eye fatigue will always exist with stereoscopic 3D technology. 3 These facts have motivated researchers to seek alternative means for capturing true 3D content, most notably holography and holoscopic imaging. Due to the interference of the coherent light fields required to record holograms, their use is still limited and mostly confined to research laboratories. Holoscopic imaging (also referred to as integral imaging) in its simplest form, on the other hand, consists of a lens array mated to a digital sensor with each lens capturing perspective views of the scene. 49 In this case, the light field does not need to be coherent, so holoscopic color images can be obtained with full parallax. This conveniently lets us adopt more conventional live capture and display procedures. Furthermore, 3D holoscopic imaging offers fatigue-free viewing to more than one person, independent of the viewers' positions.Due to recent advances in theory and microlens manufacturing, 3D holoscopic imaging is becoming a practical, prospective 3D display technology and is thus attracting much interest in the 3D area. The 3D Live Immerse VideoAudio Interactive Multimedia (3D Vivant, www.3dvivant.eu) project, funded by the EU-FP7 ICT-4-1.5Networked Media and 3D Internet, has proposed advances in 3D holoscopic imaging technology for the capture, representation, processing, and display of 3D holoscopic content that overcome most of the aforementioned restrictions faced by traditional 3D technologies. This article presents our work as part of the 3D Vivant project. 3D Holoscopic Content GenerationThe 3D holoscopic imaging technique creates and represents a true volume spatial optical model of the objec...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.