Collinearity refers to the non independence of predictor variables, usually in a regression‐type analysis. It is a common feature of any descriptive ecological data set and can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model. Collinearity is a severe problem when a model is trained on data from one region or time, and predicted to another with a different or unknown structure of collinearity. To demonstrate the reach of the problem of collinearity in ecology, we show how relationships among predictors differ between biomes, change over spatial scales and through time. Across disciplines, different approaches to addressing collinearity problems have been developed, ranging from clustering of predictors, threshold‐based pre‐selection, through latent variable methods, to shrinkage and regularisation. Using simulated data with five predictor‐response relationships of increasing complexity and eight levels of collinearity we compared ways to address collinearity with standard multiple regression and machine‐learning approaches. We assessed the performance of each approach by testing its impact on prediction to new data. In the extreme, we tested whether the methods were able to identify the true underlying relationship in a training dataset with strong collinearity by evaluating its performance on a test dataset without any collinearity. We found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold‐based pre‐selection. Our results highlight the value of GLM in combination with penalised methods (particularly ridge) and threshold‐based pre‐selection when omitted variables are considered in the final interpretation. However, all approaches tested yielded degraded predictions under change in collinearity structure and the ‘folk lore’‐thresholds of correlation coefficients between predictor variables of |r| >0.7 was an appropriate indicator for when collinearity begins to severely distort model estimation and subsequent prediction. The use of ecological understanding of the system in pre‐analysis variable selection and the choice of the least sensitive statistical approaches reduce the problems of collinearity, but cannot ultimately solve them.
Species data held in museum and herbaria, survey data and opportunistically observed data are a substantial information resource. A key challenge in using these data is the uncertainty about where an observation is located. This is important when the data are used for species distribution modelling (SDM), because the coordinates are used to extract the environmental variables and thus, positional error may lead to inaccurate estimation of the species-environment relationship. The magnitude of this effect is related to the level of spatial autocorrelation in the environmental variables. Using local spatial association can be relevant because it can lead to the identification of the specific occurrence records that cause the largest drop in SDM accuracy. Therefore, in this study, we tested whether the SDM predictions are more affected by positional uncertainty originating from locations that have lower local spatial association in their predictors. We performed this experiment for Spain and the Netherlands, using simulated datasets derived from well known species distribution models (SDMs). We used the K statistic to quantify the local spatial association in the predictors at each species occurrence location. A probabilistic approach using Monte Carlo simulations was employed to introduce the error in the species locations. The results revealed that positional uncertainty in species occurrence data at locations with low local spatial association in predictors reduced the prediction accuracy of the SDMs. We propose that local spatial association is a way to identify the species occurrence records that require treatment for positional uncertainty. We also developed and present a tool in the R environment to target observations that are likely to create error in the output from SDMs as a result of positional uncertainty.
Savanna ecosystems are characterized by the co‐occurrence of trees and grasses. In this paper, we argue that the balance between trees and grasses is, to a large extent, determined by the indirect interactive effects of herbivory and fire. These effects are based on the positive feedback between fuel load (grass biomass) and fire intensity. An increase in the level of grazing leads to reduced fuel load, which makes fire less intense and, thus, less damaging to trees and, consequently, results in an increase in woody vegetation. The system then switches from a state with trees and grasses to a state with solely trees. Similarly, browsers may enhance the effect of fire on trees because they reduce woody biomass, thus indirectly stimulating grass growth. This consequent increase in fuel load results in more intense fire and increased decline of biomass. The system then switches from a state with solely trees to a state with trees and grasses. We maintain that the interaction between fire and herbivory provides a mechanistic explanation for observed discontinuous changes in woody and grass biomass. This is an alternative for the soil degradation mechanism, in which there is a positive feedback between the amount of grass biomass and the amount of water that infiltrates into the soil. The soil degradation mechanism predicts no discontinuous changes, such as bush encroachment, on sandy soils. Such changes, however, are frequently observed. Therefore, the interactive effects of fire and herbivory provide a more plausible explanation for the occurrence of discontinuous changes in savanna ecosystems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.