Recent work on depth estimation up to now has only focused on projective images ignoring 360 o content which is now increasingly and more easily produced. We show that monocular depth estimation models trained on traditional images produce sub-optimal results on omnidirectional images, showcasing the need for training directly on 360 o datasets, which however, are hard to acquire. In this work, we circumvent the challenges associated with acquiring high quality 360 o datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360 o via rendering. This dataset, which is considerably larger than similar projective datasets, is publicly offered to the community to enable future research in this direction. We use this dataset to learn in an end-to-end fashion the task of depth estimation from 360 o images. We show promising results in our synthesized data as well as in unseen realistic images.
Learning based approaches for depth perception are limited by the availability of clean training data. This has led to the utilization of view synthesis as an indirect objective for learning depth estimation using efficient data acquisition procedures. Nonetheless, most research focuses on pinhole based monocular vision, with scarce works presenting results for omnidirectional input. In this work, we explore spherical view synthesis for learning monocular 360 o depth in a self-supervised manner and demonstrate its feasibility. Under a purely geometrically derived formulation we present results for horizontal and vertical baselines, as well as for the trinocular case. Further, we show how to better exploit the expressiveness of traditional CNNs when applied to the equirectangular domain in an efficient manner. Finally, given the availability of ground truth depth data, our work is uniquely positioned to compare view synthesis against direct supervision in a consistent and fair manner. The results indicate that alternative research directions might be better suited to enable higher quality depth perception. Our data, models and code are publicly available at https
Multi-view capture systems are complex systems to engineer. They require technical knowledge to install and complex processes to setup. However, with the ongoing developments in new production methods, we are now at a position to be able to generate high quality realistic 3D assets. Nonetheless, the capturing systems developed with these methods are intertwined with them, relying on custom solutions and seldom -if not at all -publicly available. We design, develop and publicly offer a multi-view capture system based on the latest RGB-D sensor technology. We also develop a portable and easy-to-use external calibration process to allow for its widespread use.
Since December 2019, the world has been devastated by the Coronavirus Disease 2019 (COVID-19) pandemic. Emergency Departments have been experiencing situations of urgency where clinical experts, without long experience and mature means in the fight against COVID-19, have to rapidly decide the most proper patient treatment. In this context, we introduce an artificially intelligent tool for effective and efficient Computed Tomography (CT)-based risk assessment to improve treatment and patient care. In this paper, we introduce a data-driven approach built on top of volume-of-interest aware deep neural networks for automatic COVID-19 patient risk assessment (discharged, hospitalized, intensive care unit) based on lung infection quantization through segmentation and, subsequently, CT classification. We tackle the high and varying dimensionality of the CT input by detecting and analyzing only a sub-volume of the CT, the Volume-of-Interest (VoI). Differently from recent strategies that consider infected CT slices without requiring any spatial coherency between them, or use the whole lung volume by applying abrupt and lossy volume down-sampling, we assess only the “most infected volume” composed of slices at its original spatial resolution. To achieve the above, we create, present and publish a new labeled and annotated CT dataset with 626 CT samples from COVID-19 patients. The comparison against such strategies proves the effectiveness of our VoI-based approach. We achieve remarkable performance on patient risk assessment evaluated on balanced data by reaching 88.88%, 89.77%, 94.73% and 88.88% accuracy, sensitivity, specificity and F1-score, respectively.
We describe four fundamental challenges that complex real-life Virtual Reality (VR) productions are facing today (such as multi-camera management, quality control, automatic annotation with cinematography and 360˚ depth estimation) and describe an integrated solution, called Hyper 360, to address them. We demonstrate our solution and its evaluation in the context of practical productions and present related results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.