It is very common that only presence data are available in ecological niche modeling. However, most existing methods for evaluating the accuracy of presence-absence (binary) predictions of species require presence-absence data. The aim of this study is to present a new method for accuracy assessment that does not rely on absence data.Two new statistics F pb and F cpb were derived based on presence-background data. With generated six virtual species, we used DOMAIN, generalized linear modeling (GLM), and maximum entropy (MAXENT) to produce different species presence-absence predictions. To investigate the effectiveness of the new statistics in accuracy assessment, we used F pb , F cpb , the traditional F-measure (F ), kappa coefficient, true skill statistic (TSS), area under the receiver operating characteristic curve (AUC), and the contrast validation index (CVI) to evaluate the accuracy of predictions, and the behaviors of these accuracy measures were compared. The effectiveness of F pb for threshold selection and estimation of species prevalence was also investigated.Experimental results show that F cpb is an estimate of F. The Pearson's correlation coefficient (COR) between F cpb and F is 0.9882, with a root-mean-square error (RMSE) of 0.0171. In general, F pb , F cpb , F, kappa coefficient, TSS, and CVI can sort models by the accuracy of binary prediction, but AUC is not appropriate to evaluate the accuracy of binary prediction. For DOMAIN, GLM, and MAXENT, finding the threshold by maximizing F pb and by maximizing F result in similar accuracies. In addition, the estimation of species prevalence based on binary output with maximizing F pb as the thresholding method is significantly more accurate than simply averaging the original continuous output. The best estimate of prevalence is provided by the binary output of MAXENT, with an RMSE of 0.0116.Finally, we conclude that the new method is promising in accuracy assessment, threshold selection, and estimation of species prevalence, all of which are important but challenging problems with presence-only data. Because it does not require absence data, the new method will have important applications in ecological niche modeling.
1. The receiver operating characteristic (ROC) and precision-recall (PR) plots have been widely used to evaluate the performance of species distribution models.Plotting the ROC/PR curves requires a traditional test set with both presence and absence data (namely PA approach), but species absence data are usually not available in reality. Plotting the ROC/PR curves from presence-only data while treating background data as pseudo absence data (namely PO approach) may provide misleading results.2. In this study, we propose a new approach to calibrate the ROC/PR curves from presence and background data with user-provided information on a constant c, namely PB approach. Here, c defines the probability that species occurrence is detected (labeled), and an estimate of c can also be derived from the PB-based ROC/ PR plots given that a model with good ability of discrimination is available. We used five virtual species and a real aerial photography to test the effectiveness of the proposed PB-based ROC/PR plots. Different models (or classifiers) were trained from presence and background data with various sample sizes. The ROC/ PR curves plotted by PA approach were used to benchmark the curves plotted by PO and PB approaches. Experimental resultsshow that the curves and areas under curves by PB approach are more similar to that by PA approach as compared with PO approach.The PB-based ROC/PR plots also provide highly accurate estimations of c in our experiment.4. We conclude that the proposed PB-based ROC/PR plots can provide valuable complements to the existing model assessment methods, and they also provide an additional way to estimate the constant c (or species prevalence) from presence and background data.
1. The receiver operating characteristic (ROC) and precision-recall (PR) plots have been widely used to evaluate the performances of species distribution models. Plotting ROC/PR curves requires a traditional test set with both presence and absence data (namely PA approach), but species absence data are usually not available in reality. Plotting ROC/PR curves from presence-only data while treating background data as pseudo absence data (namely PO approach) may provide misleading results. 2. In this study we propose a new approach to calibrate the ROC/PR curves from presence and background data with user-provided information on a constant c, namely PB approach. An estimate of c can also be derived from the PB-based ROC/PR plots given that a model with good ability of discrimination is available. We used three virtual species and a real aerial photography to test the effectiveness of the proposed PB-based ROC/PR plots. Different models (or classifiers) were trained from presence and background data with various samples sizes. The ROC/PR curves plotted by PA approach were used to benchmark the curves plotted by PO and PB approaches. 3. Experimental results show that the curves and areas under curves by PB approach are more similar to that by PA approach as compared with PO approach. The PB-based ROC/PR plots also provide highly accurate estimations of c in our experiment. 4. We conclude that the proposed PB-based ROC/PR plots can provide valuable complements to existing model assessment methods, and they also provide an additional way to estimate the constant c (or species prevalence) from presence and background data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.