Aim Citizen science data are increasingly used for modelling species distributions because they offer broad spatiotemporal coverage of local observations. However, such data are often collected without experimental design or set survey methods, raising the risk that bias and noise will compromise modelled predictions. We tested the ability of species distribution models (SDMs) built from these low‐structure citizen science data to match the quality of SDMs from systematically collected data and tested whether stringent data filtering improved predictions. Location Northeastern USA. Methods We evaluated models built from a rapidly growing dataset of avian occurrences reported by birders—eBird—against models built from four independent, systematically collected datasets. We developed SDMs for 96 species using both data sources and compared their predictive abilities. We also tested whether culling eBird data by applying stringent data filters on survey effort or observer expertise improved predictions. Results We found that SDMs built from low‐structure citizen science data matched or exceeded performance of SDMs from systematically collected datasets for 12%–31% of species (truex¯ = 22%), depending on the dataset. At least one culling option produced equivalent or better performance for 40%–70% of species (truex¯ = 49%). Data culling by restricting survey effort improved predictions more than restricting by observer expertise. The optimal effort restriction differed by dataset, and for three of the datasets was further informed by species traits. Main conclusions Species distribution models developed using low‐structure citizen science data sometimes performed as well as those from systematic data. Culling generally improved models, but results were heterogeneous, prohibiting clear recommendations for how to cull. Our results indicate that the growing availability of citizen science data holds potential for creating high‐quality spatial predictions, but that time should be invested in determining how best to cull datasets and that one‐size‐fits‐all solutions beyond basic outlier filtering may be hard to find.
Spatial biases are a common feature of presence–absence data from citizen scientists. Spatial thinning can mitigate errors in species distribution models (SDMs) that use these data. When detections or non‐detections are rare, however, SDMs may suffer from class imbalance or low sample size of the minority (i.e. rarer) class. Poor predictions can result, the severity of which may vary by modelling technique. To explore the consequences of spatial bias and class imbalance in presence–absence data, we used eBird citizen science data for 102 bird species from the northeastern USA to compare spatial thinning, class balancing and majority‐only thinning (i.e. retaining all samples of the minority class). We created SDMs using two parametric or semi‐parametric techniques (generalized linear models and generalized additive models) and two machine learning techniques (random forest and boosted regression trees). We tested the predictive abilities of these SDMs using an independent and systematically collected reference dataset with a combination of discrimination (area under the receiver operator characteristic curve; true skill statistic; area under the precision‐recall curve) and calibration (Brier score; Cohen's kappa) metrics. We found large variation in SDM performance depending on thinning and balancing decisions. Across all species, there was no single best approach, with the optimal choice of thinning and/or balancing depending on modelling technique, performance metric and the baseline sample prevalence of species in the data. Spatially thinning all the data was often a poor approach, especially for species with baseline sample prevalence <0.1. For most of these rare species, balancing classes improved model discrimination between presence and absence classes using machine learning techniques, but typically hindered model calibration. Baseline sample prevalence, sample size, modelling approach and the intended application of SDM output—whether discrimination or calibration—should guide decisions about how to thin or balance data, given the considerable influence of these methodological choices on SDM performance. For prognostic applications requiring good model calibration (vis‐à‐vis discrimination), the match between sample prevalence and true species prevalence may be the overriding feature and warrants further investigation.
The Prairie Pothole Region (PPR) of the north-central U.S. and south-central Canada contains millions of small prairie wetlands that provide critical habitat to many migrating and breeding waterbirds. Due to their small size and the relatively dry climate of the region, these wetlands are considered at high risk for negative climate change effects as temperatures increase. To estimate the potential impacts of climate change on breeding waterbirds, we predicted current and future distributions of species common in the PPR using species distribution models (SDMs). We created regional-scale SDMs for the U.S. PPR using Breeding Bird Survey occurrence records for 1971–2011 and wetland, upland, and climate variables. For each species, we predicted current distribution based on climate records for 1981–2000 and projected future distributions to climate scenarios for 2040–2049. Species were projected to, on average, lose almost half their current habitat (-46%). However, individual species projections varied widely, from +8% (Upland Sandpiper) to -100% (Wilson's Snipe). Variable importance ranks indicated that land cover (wetland and upland) variables were generally more important than climate variables in predicting species distributions. However, climate variables were relatively more important during a drought period. Projected distributions of species responses to climate change contracted within current areas of distribution rather than shifting. Given the large variation in species-level impacts, we suggest that climate change mitigation efforts focus on species projected to be the most vulnerable by enacting targeted wetland management, easement acquisition, and restoration efforts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.