2020
DOI: 10.1111/ddi.13030
|View full text |Cite
|
Sign up to set email alerts
|

Deciphering ecology from statistical artefacts: Competing influence of sample size, prevalence and habitat specialization on species distribution models and how small evaluation datasets can inflate metrics of performance

Abstract: Aim Sample size and species characteristics, including prevalence and habitat specialization, can influence the predictive performance of species distribution models (SDMs). There is little agreement, however, on which metric of model performance to use. Here, we directly compare AUC and partial ROC as metrics of SDM performance through analyses on the effects of species traits and sample size on SDM performance. Location Three counties dominated by agricultural lands and coniferous forest in Oregon's Willamet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
10

Relationship

2
8

Authors

Journals

citations
Cited by 27 publications
(24 citation statements)
references
References 44 publications
0
14
0
Order By: Relevance
“…In our case, the metrics that showed the highest agreement with the ground-truth evaluation were AUC, TSS, MAE, Bias, and ECE. When using datasets with small spatial coverage, care must be taken to provide an adequate number of samples for evaluation, as metrics may be dependent on sample size 81 . In our study, it was shown that AUC and TSS are higher in datasets containing more points.…”
Section: Discussionmentioning
confidence: 99%
“…In our case, the metrics that showed the highest agreement with the ground-truth evaluation were AUC, TSS, MAE, Bias, and ECE. When using datasets with small spatial coverage, care must be taken to provide an adequate number of samples for evaluation, as metrics may be dependent on sample size 81 . In our study, it was shown that AUC and TSS are higher in datasets containing more points.…”
Section: Discussionmentioning
confidence: 99%
“…Besides sample size and coverage, other sources of uncertainty and error in the data must be carefully controlled for (see a review of main problems and possible solutions in Rocchini et al, 2011). In general, with a poor‐quality evaluation dataset, the reliability of the models will always be uncertain (Hallman & Robinson, 2020; Jiménez‐Valverde, 2020, 2021).…”
Section: Discussionmentioning
confidence: 99%
“…Current evidence is reasonably clear that careful filtering to select data only most relevant to a particular research question and only from the most experienced observers produces results that align well with results generated from highly trained professionals (Steen et al, 2019). In some cases, more than 90% of community data goes unused after the filtering process (Hallman and Robinson, 2020b). Yet those 90% can be used for thousands of other questions, such as learning how birders improve their skills through time, how the adoption of and contributions to community science differ across geography, and other aspects of the human dimensions of biodiversity interests.…”
mentioning
confidence: 84%