FlexISP 32.5 dB Ours 38.4 dB Adobe CR 31.7 dB reference noisy ours 33.3 dB ref. [Condat 2012] 32.4 dB Figure 1:We propose a data-driven approach for jointly solving denoising and demosaicking. By carefully designing a dataset made of rare but challenging image features, we train a neural network that outperforms both the state-of-the-art and commercial solutions on demosaicking alone (group of images on the left, insets show error maps), and on joint denoising-demosaicking (on the right, insets show close-ups). The benefit of our method is most noticeable on difficult image structures that lead to moiré or zippering of the edges.
Modern camera calibration and multiview stereo techniques enable users to smoothly navigate between different views of a scene captured using standard cameras. The underlying automatic 3D reconstruction methods work well for buildings and regular structures but often fail on vegetation, vehicles, and other complex geometry present in everyday urban scenes. Consequently, missing depth information makes Image-Based Rendering (IBR) for such scenes very challenging. Our goal is to provide plausible free-viewpoint navigation for such datasets. To do this, we introduce a new IBR algorithm that is robust to missing or unreliable geometry, providing plausible novel views even in regions quite far from the input camera positions. We first oversegment the input images, creating superpixels of homogeneous color content which often tends to preserve depth discontinuities. We then introduce a depth synthesis approach for poorly reconstructed regions based on a graph structure on the oversegmentation and appropriate traversal of the graph. The superpixels augmented with synthesized depth allow us to define a local shape-preserving warp which compensates for inaccurate depth. Our rendering algorithm blends the warped images, and generates plausible image-based novel views for our challenging target scenes. Our results demonstrate novel view synthesis in real time for multiple challenging scenes with significant depth complexity, providing a convincing immersive navigation experience.
We present results of automatic segmentation algorithms for our datasets (see Sec. 1) and implementation details of the 3D point processing step of our approach (see Sec. 2). Silhouette ExtractionA comparison between different segmentation algorithms and manual silhouettes annotation is shown in Fig. 1. All results have been generated using code provided by the authors. The first row shows one of the input images of our datasets. Second row shows silhouettes annotated manually which have been used to generate the image-based rendering results presented in the paper. The third row shows the final result of occlusion boundary extraction from single image by Hoiem et al. [HSEH07]. The fourth and fifth rows show the soft and binary edge maps using a combination of Arbelaez We observed that [HSEH07] works well for some cases, but suffers from inaccurate localization, false positives and missed edges. We tried erasing false matches and scribbling missed edges manually, but this took more time than completely manual annotation of silhouettes. We also converted edge-maps polylines using contour tracing [TC89] and polygon approximation [DP72] to check if user interaction on polygonal curves is easier. However, contour tracing becomes ambiguous for many intersecting contours and the result has too many doubled line segments and noise (see Fig. 2(c)) to fix in less than 40 seconds per image required for manual annotation. The double line appears because chaining algorithm tries to close the contour by walking back and forth.We tried using depth cues from reconstruction to eliminate false positives. For this, we oversegmented the images into thousands of contiguous regions, called 'superpixels' (Fig.3(b)) using [FH04], and used available depth to assign a depth to every superpixel. The boundary between two superpixel is considered a silhouette if they are at substantially different depths (Fig.3(c)). However, we observed these 'depth images' have very noisy silhouettes becuase not all superpixels have robust depth information.We believe that a combination of depth and multi-view information with the best elements of previous segmentation algorithms could allow the development of an more robust algorithm. While a completely automated approach is very hard, we are confident that the overall manual interaction time would be greatly reduced with such an approach. 3D Point SelectionThe goal of this step is to decimate the reconstructed point set down to a sparse set distributed uniformly over the image, fill in regions with few or no reconstructed points and remove erroneous points near silhouettes. The following steps are used to The 3D point selection described in Sec. 4.2 of the paper involves the following steps:Decimation. We splat the point set with a large splat size and depth test enabled. We count the number of pixels that each point covers after the depth test. We select a subset of desired size that covers the maximum number of pixels (see Fig. 4(a),(b)). The splat size is not critical as long as it is not too small (...
BackgroundVirtual reality (VR) opens up a vast number of possibilities in many domains of therapy. The primary objective of the present study was to evaluate the acceptability for elderly subjects of a VR experience using the image-based rendering virtual environment (IBVE) approach and secondly to test the hypothesis that visual cues using VR may enhance the generation of autobiographical memories.MethodsEighteen healthy volunteers (mean age 68.2 years) presenting memory complaints with a Mini-Mental State Examination score higher than 27 and no history of neuropsychiatric disease were included. Participants were asked to perform an autobiographical fluency task in four conditions. The first condition was a baseline grey screen, the second was a photograph of a well-known location in the participant’s home city (FamPhoto), and the last two conditions displayed VR, ie, a familiar image-based virtual environment (FamIBVE) consisting of an image-based representation of a known landmark square in the center of the city of experimentation (Nice) and an unknown image-based virtual environment (UnknoIBVE), which was captured in a public housing neighborhood containing unrecognizable building fronts. After each of the four experimental conditions, participants filled in self-report questionnaires to assess the task acceptability (levels of emotion, motivation, security, fatigue, and familiarity). CyberSickness and Presence questionnaires were also assessed after the two VR conditions. Autobiographical memory was assessed using a verbal fluency task and quality of the recollection was assessed using the “remember/know” procedure.ResultsAll subjects completed the experiment. Sense of security and fatigue were not significantly different between the conditions with and without VR. The FamPhoto condition yielded a higher emotion score than the other conditions (P<0.05). The CyberSickness questionnaire showed that participants did not experience sickness during the experiment across the VR conditions. VR stimulates autobiographical memory, as demonstrated by the increased total number of responses on the autobiographical fluency task and the increased number of conscious recollections of memories for familiar versus unknown scenes (P<0.01).ConclusionThe study indicates that VR using the FamIBVE system is well tolerated by the elderly. VR can also stimulate recollections of autobiographical memory and convey familiarity of a given scene, which is an essential requirement for use of VR during reminiscence therapy.
We introduce a method to compute intrinsic images for a multi-view set of outdoor photos with cast shadows, taken under the same lighting. We use an automatic 3D reconstruction from these photos and the sun direction as input and decompose each image into reflectance and shading layers, despite the inaccuracies and missing data of the 3D model. Our approach is based on two key ideas. First, we progressively improve the accuracy of the parameters of our image formation model by performing iterative estimation and combining 3D lighting simulation with 2D image optimization methods. Second we use the image formation model to express reflectance as a function of discrete visibility values for shadow and light, which allows us to introduce a robust visibility classifier for pairs of points in a scene. This classifier is used for shadow labelling, allowing us to compute high quality reflectance and shading layers. Our multi-view intrinsic decomposition is of sufficient quality to allow relighting of the input images. We create shadow-caster geometry which preserves shadow silhouettes and using the intrinsic layers, we can perform multi-view relighting with moving cast shadows. We present results on several multi-view datasets, and show how it is now possible to perform image-based rendering with changing illumination conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.