The Data Interpolating Variational Analysis (Diva) is a method designed to interpolate irregularly-spaced, noisy data onto any desired location, in most cases on regular grids. It is the combination of a particular methodology, based on the minimisation of a cost function, and a numerically efficient method, based on a finite-element solver. The cost function penalises the misfit between the observations and the reconstructed field, as well as the regularity or smoothness of the field. The method bears similarities to the smoothing splines, where the second derivatives of the field are also penalised.The intrinsic advantages of the method are its natural way to take into account topographic and dynamic constraints (coasts, advection, . . . ) and its capacity to handle large data sets, frequently encountered in oceanography. The method provides gridded fields in two dimensions, usually in horizontal layers. Three-dimension fields are obtained by stacking horizontal layers.In the present work, we summarize the background of the method and describe the possible methods to compute the error field associated to the analysis. In particular, we present new developments leading to a more consistent error estimation, by determining numerically the real covariance function in Diva, which is never formulated explicitly, contrarily to Optimal Interpolation. The real covariance function is obtained by two concurrent executions of Diva, the first providing the covariance for the second. With this improvement, the error field is now perfectly consistent with the inherent background covariance in all cases.A two-dimension application using salinity measurements in the Mediterranean Sea is presented. Applied on these measurements, Optimal Interpolation and Diva provided very similar gridded fields (correlation: 98.6%, RMS of the difference: 0.02). The method using the real covariance produces an error field similar to the one of OI, except in the coastal areas.
[1] Numerous climatologies are available at different resolutions and cover various parts of the global ocean. Most of them have a resolution too low to represent suitably regional processes and the methods for their construction are not able to take into account the influence of physical effects (topographic constraints, boundary conditions, advection, etc.). A high-resolution atlas for temperature and salinity is developed for the northeast Atlantic Ocean on 33 depth levels. The originality of this climatology is twofold: (1) For the data set, data are collected on all major databases and aggregated to lead to an original data collection without duplicates, richer than the World Ocean Database 2005, for the same region of interest. (2) For the method, climatological fields are constructed using the variational method Data-Interpolating Variational Analysis. The formulation of the latter allows the consideration of coastlines and bottom topography, and has a numerical cost almost independent on the number of observations. Moreover, only a few parameters, determined in an objective way, are necessary to perform an analysis. The results show overall good agreement with the widely used World Ocean Atlas, but also reveal significant improvements in coastal areas. Error maps are generated according to different theories and emphasize the importance of data coverage for the creation of such climatological fields. Automatic outlier detection is performed, and its effects on the analysis are examined. The method presented here is very general and not dependent on the region, hence it may be applied for creating other regional atlas in different zones of the global ocean.
Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes. As reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods). Our results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases.
Optical remote sensing data is now being used systematically for marine ecosystem applications, such as the forcing of biological models and the operational detection of harmful algae blooms. However, applications are hampered by the incompleteness of imagery and by some quality problems. The Data Interpolating Empirical Orthogonal Functions methodology (DINEOF) allows calculation of missing data in geophysical datasets without requiring a priori knowledge about statistics of the full dataset and has previously been applied to SST reconstructions. This study demonstrates the reconstruction of complete space-time information for 4 years of surface chlorophyll a (CHL), total suspended matter (TSM) and sea surface temperature (SST) over the Southern North Sea (SNS) and English Channel (EC). Optimal reconstructions were obtained when synthesising the original signal into 8 modes for MERIS CHL and into 18 modes for MERIS TSM. Despite the very high proportion of missing data (70%), the variability of original signals explained by the EOF synthesis reached 93.5% for CHL and 97.2% for TSM. For the MODIS TSM dataset, 97.5% of the original variability of the signal was synthesised into 14 modes. The MODIS SST dataset could be synthesised into 13 modes explaining 98% of the input signal variability. Validation of the method is achieved for 3 dates below 2 artificial clouds, by comparing reconstructed data with excluded input information. Complete weekly and monthly averaged climatologies, suitable for use with ecosystem models, were derived from regular daily reconstructions. Error maps associated with every reconstruction were produced according to Beckers et al. (2006). Embedded in this error calculation scheme, a methodology was implemented to produce maps of outliers, allowing identification of unusual or suspicious data points compared to the global dynamics of the dataset. Various algorithm artefacts were associated with high values in the outlier maps (undetected cloud edges, haze areas, contrails, and cloud shadows). With the production of outlier maps, the data reconstruction technique becomes also a very efficient tool for quality control of optical remote sensing data and for change detection within large databases.
Abstract. DINEOF (Data Interpolating Empirical Orthogonal Functions) is an EOF-based technique for the reconstruction of missing data in geophysical fields, such as those produced by clouds in sea surface temperature satellite images. A technique to reduce spurious time variability in DINEOF reconstructions is presented. The reconstruction of these images within a long time series using DINEOF can lead to large discontinuities in the reconstruction. Filtering the temporal covariance matrix allows to reduce this spurious variability and therefore more realistic reconstructions are obtained. The approach is tested in a three years sea surface temperature data set over the Black Sea. The effect of the filter in the temporal EOFs is presented, as well as some examples of the improvement achieved with the filtering in the SST reconstruction, both compared to the DINEOF approach without filtering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.