In order to map the spatial distribution of twenty tree species groups over Europe at 1 km 9 1 km resolution, the ICP-Forest Level-I plot data were extended with the National Forest Inventory (NFI) plot data of eighteen countries. The NFI grids have a much smaller spacing than the ICP grid. In areas with NFI plot data, the proportions of the land area covered by the tree species were mapped by compositional kriging. Outside these areas, these proportions were mapped with a multinomial multiple logistic regression model. A soil map, a biogeographical map and bioindicators derived from temperature and precipitation data were used as predictors. Both methods ensure that the predicted proportions are in the interval [0,1] and sum to 1. The regression predictions were iteratively scaled to the National Forest Inventory statistics and the Forest map of Europe. The predicted proportions for the twenty tree species were validated by the Bhattacharryya distance between predicted and observed proportions at 230 plot data separated from the calibration data. Besides, the map with the predicted dominant species was validated by computing the error matrix. The median Bhattacharryya distance in the subarea with NFI plot data was 1.712, whereas in the subarea with ICP-Level-I data, this was 2.131. The scaling did not significantly decrease the Bhattacharryya distance. The estimated overall accuracy of this map was 43%. In areas with NFI plot data, overall accuracy was 57%, outside these areas 33%. This gain was mainly attributable to the much denser plot data, less to the prediction method.
The increase in digital soil mapping around the world means that appropriate and efficient sampling strategies are needed for validation. Data used for calibrating a digital soil mapping model typically are non-random samples. In such a case we recommend collection of additional independent data and validation of the soil map by a design-based sampling strategy involving probability sampling and design-based estimation of quality measures. An important advantage over validation by data-splitting or cross-validation is that model-free estimates of the quality measures and their standard errors can be obtained, and thus no assumptions on the spatial auto-correlation of prediction errors need to be made. The quality of quantitative soil maps can be quantified by the spatial cumulative distribution function (SCDF) of the prediction errors, whereas for categorical soil maps the overall purity and the map unit purities (user's accuracies) and soil class representation (producer's accuracies) are suitable quality measures. The suitability of five basic types of random sampling design for soil map validation was evaluated: simple, stratified simple, systematic, cluster and two-stage random sampling. Stratified simple random sampling is generally a good choice: it is simple to implement, estimation of the quality measures and their precision is straightforward, it gives relatively precise estimates, and no assumptions are needed in quantifying the standard error of the estimated quality measures. Validation by probability sampling is illustrated with two case studies. A categorical soil map on point support depicting soil classes in the province of Drenthe of the Netherlands (268 000 ha) was validated by stratified simple random sampling. Sub-areas with different expected purities were used as strata. The estimated overall purity was 58% with a standard error of 4%. This was 9% smaller than the theoretical purity computed with the model. Map unit purities and class representations were estimated by the ratio estimator. A quantitative soil map, depicting the average soil organic carbon (SOC) contents of pixels in an area of 81 600 ha in Senegal, was validated by random transect sampling. SOC predictions were seriously biased, and the random error was considerable. Both case studies underpin the importance of independent validation of soil maps by probability sampling, to avoid unfounded trust in visually attractive maps produced by advanced pedometric techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.