The area under the receiver operating characteristic (ROC) curve, known as the AUC, is currently considered to be the standard method to assess the accuracy of predictive distribution models. It avoids the supposed subjectivity in the threshold selection process, when continuous probability derived scores are converted to a binary presence-absence variable, by summarizing overall model performance over all possible thresholds. In this manuscript we review some of the features of this measure and bring into question its reliability as a comparative measure of accuracy between model results. We do not recommend using AUC for five reasons: (1) it ignores the predicted probability values and the goodness-of-fit of the model; (2) it summarises the test performance over regions of the ROC space in which one would rarely operate; (3) it weights omission and commission errors equally; (4) it does not give information about the spatial distribution of model errors; and, most importantly, (5) the total extent to which models are carried out highly influences the rate of well-predicted absences and the AUC scores.
Risk maps summarizing landscape suitability of novel areas for invading species can be valuable tools for preventing species' invasions or controlling their spread, but methods employed for development of such maps remain variable and unstandardized. We discuss several considerations in development of such models, including types of distributional information that should be used, the nature of explanatory variables that should be incorporated, and caveats regarding model testing and evaluation. We highlight that, in the case of invasive species, such distributional predictions should aim to derive the best hypothesis of the potential distribution of the species by using (1) all distributional information available, including information from both the native range and other invaded regions; (2) predictors linked as directly as is feasible to the physiological requirements of the species; and (3) modelling procedures that carefully avoid overfitting to the training data. Finally, model testing and evaluation should focus on well-predicted presences, and less on efficient prediction of absences; a k-fold regional cross-validation test is discussed.
Aim Nowadays, large amounts of species distribution data and software for implementing different species distribution modelling methods are freely available through the internet. As a result, methodological works that analyse the relative performance of modelling techniques, as well as those that study which species characteristics affect their performance, are necessary. We discuss three important topics that must be kept in mind when modelling species distributions, namely (i) the distinction between potential and realized distribution, (ii) the effect of the relative occurrence area of the species on the results of the evaluation of model performance, and (iii) the general inaccuracy of the predictions of the realized distribution provided by species distribution modelling methods. Location Unspecific.Methods Using some recent papers as a basis, we illustrate the three issues mentioned above and discuss the negative implications of neglecting them.Results Considering a potential-realized distribution gradient, different modelling methods may be arranged along this gradient according to their ability to model any concept. Complex techniques may be more suitable to model the realized distribution than simple ones, which may be more appropriate to estimate the potential distribution. Comparisons among techniques must consider this scenario. The relative occurrence area of the species conditions the results of the evaluation scores, implying that models of rare species will unavoidably yield higher discrimination values. Moreover, discrimination values that are usually reported in the literature may imply considerable over or underestimations of the distribution of the species.Main conclusions It is extremely important to establish a solid conceptual and methodological framework on which the emergent field of species distribution modelling can stand and develop.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.