Abstract. State-of-the-art automated image orientation (Structure from Motion) and dense image matching (Multiple View Stereo) methods commonly used to produce 3D information from 2D images can generate 3D results – such as point cloud or meshes – of varying geometric and visual quality. Pipelines are generally robust and reliable enough, mostly capable to process even large sets of unordered images, yet the final results often lack completeness and accuracy, especially while dealing with real-world cases where objects are typically characterized by complex geometries and textureless surfaces and obstacles or occluded areas may also occur. In this study we investigate three of the available commonly used open-source solutions, namely COLMAP, OpenMVG+OpenMVS and AliceVision, evaluating their results under diverse large scale scenarios. Comparisons and critical evaluation on the image orientation and dense point cloud generation algorithms is performed with respect to the corresponding ground truth data. The presented FBK-3DOM datasets are available for research purposes.
Conventional multi-view stereo (MVS) approaches based on photo-consistency measures are generally robust, yet often fail in calculating valid depth pixel estimates in low textured areas of the scene. In this study, a novel approach is proposed to tackle this challenge by leveraging semantic priors into a PatchMatch-based MVS in order to increase confidence and support depth and normal map estimation. Semantic class labels on image pixels are used to impose class-specific geometric constraints during multiview stereo, optimising the depth estimation on weakly supported, textureless areas, commonly present in urban scenarios of building facades, indoor scenes, or aerial datasets. Detecting dominant shapes, e.g., planes, with RANSAC, an adjusted cost function is introduced that combines and weighs both photometric and semantic scores propagating, thus, more accurate depth estimates. Being adaptive, it fills in apparent information gaps and smoothing local roughness in problematic regions while at the same time preserves important details. Experiments on benchmark and custom datasets demonstrate the effectiveness of the presented approach.
<p><strong>Abstract.</strong> Automatic semantic segmentation of images is becoming a very prominent research field with many promising and reliable solutions already available. Labelled images as input for the photogrammetric pipeline have enormous potential to improve the 3D reconstruction results. To support this argument, in this work we discuss the contribution of image semantic labelling towards image-based 3D reconstruction in photogrammetry. We experiment semantic information in various steps starting from feature matching to dense 3D reconstruction. Labelling in 2D is considered as an easier task in terms of data availability and algorithm maturity. However, since semantic labelling of all the images involved in the reconstruction may be a costly, laborious and time consuming task, we propose to use a deep learning architecture to automatically generate semantically segmented images. To this end, we have trained a Convolutional Neural Network (CNN) on historic building façade images that will be further enriched in the future. The first results of this study are promising, with an improved performance on the quality of the 3D reconstruction and the possibility to transfer the labelling results from 2D to 3D.</p>
ABSTRACT:Outdoor large-scale cultural sites are mostly sensitive to environmental, natural and human made factors, implying an imminent need for a spatio-temporal assessment to identify regions of potential cultural interest (material degradation, structuring, conservation). On the other hand, in Cultural Heritage research quite different actors are involved (archaeologists, curators, conservators, simple users) each of diverse needs. All these statements advocate that a 5D modelling (3D geometry plus time plus levels of details) is ideally required for preservation and assessment of outdoor large scale cultural sites, which is currently implemented as a simple aggregation of 3D digital models at different time and levels of details. The main bottleneck of such an approach is its complexity, making 5D modelling impossible to be validated in real life conditions. In this paper, a cost effective and affordable framework for 5D modelling is proposed based on a spatial-temporal dependent aggregation of 3D digital models, by incorporating a predictive assessment procedure to indicate which regions (surfaces) of an object should be reconstructed at higher levels of details at next time instances and which at lower ones. In this way, dynamic change history maps are created, indicating spatial probabilities of regions needed further 3D modelling at forthcoming instances. Using these maps, predictive assessment can be made, that is, to localize surfaces within the objects where a high accuracy reconstruction process needs to be activated at the forthcoming time instances. The proposed 5D Digital Cultural Heritage Model (5D-DCHM) is implemented using open interoperable standards based on the CityGML framework, which also allows the description of additional semantic metadata information. Visualization aspects are also supported to allow easy manipulation, interaction and representation of the 5D-DCHM geometry and the respective semantic information. The open source 3DCityDB incorporating a PostgreSQL geo-database is used to manage and manipulate 3D data and their semantics.
Abstract:The ability of High Dynamic Range (HDR) imaging to capture the full range of lighting in a scene has prompted an increase in its use for Cultural Heritage (CH) applications. Photogrammetric techniques allow the semi-automatic production of three-dimensional (3D) models from a sequence of images. Current photogrammetric methods are not always effective in reconstructing images under harsh lighting conditions, as significant geometric details may not have been captured accurately within under-and over-exposed regions of the image. HDR imaging offers the possibility to overcome this limitation, however the HDR images need to be tone-mapped before they can be used within existing photogrammetric algorithms. In this paper we evaluate four different HDR tone-mapping operators (TMOs) that have been used to convert raw HDR images into a format suitable for state-of-the-art algorithms, and in particular keypoint detection techniques. The evaluation criteria used are the number of keypoints, the number of valid matches achieved and the repeatability rate. The comparison considers two local and two global TMOs. HDR data from four CH sites were used: Kaisariani Monastery (Greece), Asinou Church (Cyprus), Château des Baux (France) and Buonconsiglio Castle (Italy).Key words: high dynamic range (HDR) imaging, HDR tone-mapping, keypoint detection, image-based 3D reconstruction Resumen:Las posibilidades que ofrecen las imágenes de alto rango dinámico (HDR) para registrar la totalidad del rango de iluminación de una escena han propiciado su creciente uso en aplicaciones de patrimonio cultural. Los métodos fotogramétricos actuales permiten la producción semi-automática de modelos tridimensionales (3D) a partir de una secuencia de imágenes. Sin embargo, éstos presentan serias limitaciones en escenas con iluminación dura, resultando en consecuencia la aparición de zonas expuestas o sobreexpuestas. En este tipo de condiciones, el uso de imágenes HDR ofrece la posibilidad de superar este problema. Para evaluar su potencialidad, se presentan en este artículo cuatro operadores diferentes de mapeado tonal (tone-mapping) en imágenes HDR, conocidos como TMOs, cuya misión es convertir las imágenes HDR crudas en un formato adecuado para su uso en algoritmos de vanguardia, y en particular en técnicas de detección de entidades. Los criterios de evaluación que se utilizan para analizar su potencialidad son: el número de entidades detectadas, el número de correspondencias válidas y su índice de repetibilidad. En la comparación se incluyen TMOs, dos locales y dos globales. Se utilizan datos HDR tomados en cuatro sitios patrimoniales: el monasterio de Kaisariani (Grecia), la iglesia de Asinou (Chipre), el castillo de los Baux (Francia) y el castillo de Buonconsiglio (Italia).Palabras clave: toma de imágenes de alto rango dinámico (HDR), mapeado tonal HDR, detección de entidades, reconstrucción 3D basada en imágenes
The image-based 3D reconstruction pipeline aims to generate complete digital representations of the recorded scene, often in the form of 3D surfaces. These surfaces or mesh models are required to be highly detailed as well as accurate enough, especially for metric applications. Surface generation can be considered as a problem integrated in the complete 3D reconstruction workflow and thus visibility information (pixel similarity and image orientation) is leveraged in the meshing procedure contributing to an optimal photo-consistent mesh. Other methods tackle the problem as an independent and subsequent step, generating a mesh model starting from a dense 3D point cloud or even using depth maps, discarding input image information. Out of the vast number of approaches for 3D surface generation, in this study, we considered three state of the art methods. Experiments were performed on benchmark and proprietary datasets of varying nature, scale, shape, image resolution and network designs. Several evaluation metrics were introduced and considered to present qualitative and quantitative assessment of the results.
ABSTRACT:Cultural Heritage all over the world is at high risk. Natural and human activities endanger the current state of monuments and sites, whereas many of them have already been destroyed especially during the last years. Preventive actions are of utmost importance for the protection of human memory and the prevention of irreplaceable. These actions may be carried out either in situ or virtually. Very often in situ preventive, or protective or restoration actions are difficult or even impossible, as e.g. in cases of earthquakes, fires or war activity. Digital preservation of cultural heritage is a challenging task within photogrammetry and computer vision communities, as efforts are taken to collect digital data, especially of the monuments that are at high risk. Visit to the field and data acquisition is not always feasible. To overcome the missing data problem, crowdsourced imagery is used to create a visual representation of lost cultural heritage objects. Such digital representations may be 2D or 3D and definitely help preserve the memory and history of the lost heritage. Sometimes they also assist studies for their reconstruction. An initiative to collect imagery data from the public and create a visual 3D representation of a recently destroyed stone bridge almost 150 years old is being discussed in this study. To this end, a crowdsourcing platform has been designed and the first images collected have been processed with the use of SfM algorithms.
<p><strong>Abstract.</strong> Patch-based stereo is nowadays a commonly used image-based technique for dense 3D reconstruction in large scale multi-view applications. The typical steps of such a pipeline can be summarized in stereo pair selection, depth map computation, depth map refinement and, finally, fusion in order to generate a complete and accurate representation of the scene in 3D. In this study, we aim to support the standard dense 3D reconstruction of scenes as implemented in the open source library OpenMVS by using semantic priors. To this end, during the depth map fusion step, along with the depth consistency check between depth maps of neighbouring views referring to the same part of the 3D scene, we impose extra semantic constraints in order to remove possible errors and selectively obtain segmented point clouds per label, boosting automation towards this direction. In order to reassure semantic coherence between neighbouring views, additional semantic criterions can be considered, aiming to eliminate mismatches of pixels belonging in different classes.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.