Quantifying how close two datasets are to each other is a common and necessary undertaking in scientific research. The Pearson product-moment correlation coefficient r is a widely used measure of the degree of linear dependence between two data series, but it gives no indication of how similar the values of these series are in magnitude. Although a number of indexes have been proposed to compare a dataset with a reference, only few are available to compare two datasets of equivalent (or unknown) reliability. After a brief review and numerical tests of the metrics designed to accomplish this task, this paper shows how an index proposed by Mielke can, with a minor modification, satisfy a series of desired properties, namely to be adimensional, bounded, symmetric, easy to compute and directly interpretable with respect to r. We thus show that this index can be considered as a natural extension to r that downregulates the value of r according to the bias between analysed datasets. The paper also proposes an effective way to disentangle the systematic and the unsystematic contribution to this agreement based on eigen decompositions. The use and value of the index is also illustrated on synthetic and real datasets.
For food crises early warning purposes, coarse spatial resolution NDVI data are widely used to monitor vegetation conditions in near real-time (NRT). Different types of NDVI anomalies are typically employed to assess the current state of crops and rangelands as compared to previous years. Timeliness and accuracy of such anomalies are critical factors to an effective monitoring. Temporal smoothing can efficiently reduce noise and cloud contamination in the time series of historical observations, where data points are available before and after each observation to be smoothed. With NRT data, smoothing methods are adapted to cope with the unbalanced availability of data before and after the most recent data points. These NRT approaches provide successive updates of the estimation of the same data point as more observations become available. Anomalies compare the current NDVI value with some statistics (e.g. indicators of central tendency and dispersion) extracted from the historical archive of observations. With multiple updates of the same datasets being available, two options can be selected to compute anomalies, i.e. using the same update level for the NRT data and the statistics or using the most reliable update for the latter. In this study we assess the accuracy of three commonly employed 1 km MODIS NDVI anomalies (standard scores, non-exceedance probability and vegetation condition index) with respect to (1) delay with which they become available and (2) option selected for their computation. We show that a large estimation error affects the earlier estimates and that this error is efficiently reduced in subsequent updates. In addition, with regards to the preferable option to compute anomalies, we empirically observe that it depends on the type of application (e.g. averaging anomalies value over an area of interest vs. detecting “drought” conditions by setting a threshold on the anomaly value) and the employed anomaly type. Finally, we map the spatial pattern in the magnitude of NRT anomaly estimation errors over the globe and relate it to average cloudiness.
HighlightsA rapid, standardised and objective assessment of the biophysical impact of restoration interventions is proposed.The intervention impact is evaluated by a before–after control-impact sampling design.The method provides a statistical test of the no-change hypothesis and the estimation of the relative magnitude of the change.The method is applicable to NDVI and other remote sensing-derived variables.
In spite of the exponential growth in the amount of data that one may expect to provide greater modeling and predictions opportunities, the number and diversity of sources over which this information is fragmented is growing at an even faster rate. As a consequence, there is real need for methods that aim at reconciling them inside an epistemically sound theoretical framework. In a statistical spatial prediction framework, classical methods are based on a multivariate approach of the problem, at the price of strong modeling hypotheses. Though new avenues have been recently opened by focusing on the integration of uncertain data sources, to the best of our knowledges there have been no systematic attemps to explicitly account for information redundancy through a data fusion procedure. Starting from the simple concept of measurement errors, this paper proposes an approach for integrating multiple information processing as a part of the prediction process itself through a Bayesian approach. A general formulation is first proposed for deriving the prediction distribution of a continuous variable of interest at unsampled locations using on more or less uncertain (soft) information at neighboring locations. The case of multiple information is then considered, with a Bayesian solution to the problem of fusing multiple information that are provided as separate conditional probability distributions. Well-known methods and results are derived as limit cases. The convenient hypothesis of conditional independence is discussed by the light of information theory and maximum entropy principle, and a methodology is suggested for the optimal selection of the most informative subset of information, if needed. Based on a synthetic case study, an application of the methodology is presented and discussed.
[1] Water table elevations are usually sampled in space using piezometric measurements that are unfortunately expensive to obtain and are thus scarce over space. Most of the time, piezometric data are sparsely distributed over large areas, thus providing limited direct information about the level of the corresponding water table. As a consequence, there is a real need for approaches that are able at the same time to (1) provide spatial predictions at unsampled locations and (2) enable the user to account for all potentially available secondary information sources that are in some way related to water table elevations. In this paper, a recently developed Bayesian data fusion (BDF) framework is applied to the problem of water table spatial mapping. After a brief presentation of the underlying theory, specific assumptions are made and discussed to account for a digital elevation model and for the geometry of a corresponding river network. On the basis of a data set for the Dijle basin in the north part of Belgium, the suggested model is then implemented and results are compared to those of standard techniques such as ordinary kriging and cokriging. Respective accuracies and precisions of these estimators are finally evaluated using a ''leave-one-out'' cross-validation procedure. Although the BDF methodology was illustrated here for the integration of only two secondary information sources (namely, a digital elevation model and the geometry of a river network), the method can be applied for incorporating an arbitrary number of secondary information sources, thus opening new avenues for the important topic of data integration in a spatial mapping context.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.