Deciphering ecology from statistical artefacts: Competing influence of sample size, prevalence and habitat specialization on species distribution models and how small evaluation datasets can inflate metrics of performance

Hallman, Tyler A.; Robinson, W. Douglas

doi:10.1111/ddi.13030

Cited by 27 publications

(24 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our case, the metrics that showed the highest agreement with the ground-truth evaluation were AUC, TSS, MAE, Bias, and ECE. When using datasets with small spatial coverage, care must be taken to provide an adequate number of samples for evaluation, as metrics may be dependent on sample size 81 . In our study, it was shown that AUC and TSS are higher in datasets containing more points.…”

Section: Discussionmentioning

confidence: 99%

Evaluation metrics and validation of presence-only species distribution models based on distributional maps with varying coverage

Konowalik

Nosol

2021

Sci Rep

View full text Add to dashboard Cite

We examine how different datasets, including georeferenced hardcopy maps of different extents and georeferenced herbarium specimens (spanning the range from 100 to 85,000 km2) influence ecological niche modeling. We check 13 of the available environmental niche modeling algorithms, using 30 metrics to score their validity and evaluate which are useful for the selection of the best model. The validation is made using an independent dataset comprised of presences and absences collected in a range-wide field survey of Carpathian endemic plant Leucanthemum rotundifolium (Compositae). Our analysis of models’ predictive performances indicates that almost all datasets may be used for the construction of a species distributional range. Both very local and very general datasets can produce useful predictions, which may be more detailed than the original ranges. Results also highlight the possibility of using the data from manually georeferenced archival sources in reconstructions aimed at establishing species’ ecological niches. We discuss possible applications of those data and associated problems. For the evaluation of models, we suggest employing AUC, MAE, and Bias. We show an example of how AUC and MAE may be combined to select the model with the best performance.

show abstract

Section: Discussionmentioning

confidence: 99%

Evaluation metrics and validation of presence-only species distribution models based on distributional maps with varying coverage

Konowalik

Nosol

2021

Sci Rep

View full text Add to dashboard Cite

show abstract

“…Besides sample size and coverage, other sources of uncertainty and error in the data must be carefully controlled for (see a review of main problems and possible solutions in Rocchini et al, 2011). In general, with a poor‐quality evaluation dataset, the reliability of the models will always be uncertain (Hallman & Robinson, 2020; Jiménez‐Valverde, 2020, 2021).…”

Section: Discussionmentioning

confidence: 99%

The uniform AUC: Dealing with the representativeness effect in presence–absence models

Jiménez‐Valverde

2022

Methods Ecol Evol

View full text Add to dashboard Cite

Species distribution models use occurrence data of the focus species together with environmental variables to estimate a habitat suitability score. Typically, the discrimination capacity of the models, that is, how good the instances of presence and absence are correctly classified, is assessed with statistics such as the area under the receiver operating characteristic curve (AUC). However, the value of any discrimination statistic depends on the distribution of the suitability scores in the evaluation dataset, the so‐called representativeness effect. For example, the same well‐calibrated model evaluated with two datasets, one set with mostly intermediate predicted values (around 0.5) and another set in which most cases have extreme predicted values (close to 0 and close to 1), will yield low (towards 0.5) and high (towards 1) AUC values, respectively. Thus, discrimination values are entirely context dependent and cannot be directly compare between datasets because they are not intrinsic measures of a model’s performance. In this contribution, I propose a methodology based on stratified bootstrapping with an inverse probability weighting that makes the distribution of the suitability values uniform and harmonizes the AUC so that the so‐called uAUC can now be compared between different datasets. I run simulations considering a range of sample sizes and different initial distributions of the suitability values to validate the method. I apply the method to an empirical exercise, and implement it in the R package vandalico. Although I mainly focused on the AUC, the harmonization procedure can be applied to any other discrimination measure that is prevalence independent in a strict mathematical sense, such as the sensitivity star (Se*). To provide accurate estimations, it is important to have a high‐quality evaluation dataset, that is, a dataset that covers the whole spectrum of suitability values and a minimum sample size of at least 150 cases. Further research is needed to develop a method to estimate reliable confidence intervals. I expect the harmonization methodology proposed here to be of interest in other research areas (besides ecology) that deal with discrimination/classification problems.

show abstract

“…Current evidence is reasonably clear that careful filtering to select data only most relevant to a particular research question and only from the most experienced observers produces results that align well with results generated from highly trained professionals (Steen et al, 2019). In some cases, more than 90% of community data goes unused after the filtering process (Hallman and Robinson, 2020b). Yet those 90% can be used for thousands of other questions, such as learning how birders improve their skills through time, how the adoption of and contributions to community science differ across geography, and other aspects of the human dimensions of biodiversity interests.…”

mentioning

confidence: 84%

Grand challenges at the frontiers of bird science

Robinson¹

2022

Front. Bird Sci.

View full text Add to dashboard Cite

Birds stun us with their feats of flight. Their beauty, jaw-dropping vocal performances, courtship displays and even their ingenuity capture our attention. We share the planet with feathered creatures who signal trouble ahead and, all at once, reveal incredible resilience to change. Our fascination with birds engenders concern that they prosper despite human influences on our shared environment. Birds possess a gravitational pull on our attention. Birds matter.Despite the disproportionate attention birds receive, we have much to learn about them. An abundance of scientific studies on birds-wild, captive and domesticated alikereveals that their overall complexity as biological organisms and, particularly their mobility, raises challenges. Characterizing the cutting edge of knowledge in any field of study is the grand challenge of all grand challenges. It is analogous to Heisenberg's uncertainty principle. By the time you identify it, it has already moved and you are wrong. To reduce the chances of mischaracterizing the current edge of knowledge, one can prognosticate and extrapolate. What are the directions we are heading in scientific studies of birds? Are looming technological solutions about to remove former barriers and allow rapid progress soon? With continual reduction in size and improvements in power and precision of tracking devices, for example, we stand to radically improve our understanding of avian movements and migration (Bridge et al., 2011). As the genomic, proteomic and transcriptomic revolution proceeds in concert with advances in computer science in the coming decades, numerous insights into the evolutionary history and biological functions of birds will be revealed. Many discoveries will be unexpected.Any one view of where a field of study is headed is necessarily biased toward the subdisciplines slightly better understood by the author than sub-fields rarely visited. With this admission in mind, I humbly offer a tiny sample of the grand challenges to be addressed in the next few decades in bird science. Improving the scientific value of collaborations between professionals and bird enthusiastsOrnithology has a long history of important contributions from non-professionals. Various monikers have been applied to such people, such as community or citizen scientists, amateur ornithologists, bird enthusiasts, and birders. Regardless of one's preferred term, the essential point is that hundreds of thousands of people across the world do not get paid to study birds yet have incredible knowledge about birds. Today, with continually increasing access to the internet, sharing that knowledge with other Frontiers in Bird Science frontiersin.org Robinson . /fbirs. .

show abstract

Deciphering ecology from statistical artefacts: Competing influence of sample size, prevalence and habitat specialization on species distribution models and how small evaluation datasets can inflate metrics of performance

Cited by 27 publications

References 44 publications

Evaluation metrics and validation of presence-only species distribution models based on distributional maps with varying coverage

Evaluation metrics and validation of presence-only species distribution models based on distributional maps with varying coverage

The uniform AUC: Dealing with the representativeness effect in presence–absence models

Grand challenges at the frontiers of bird science

Contact Info

Product

Resources

About