Aim Ecological data collected by the general public are valuable for addressing a wide range of ecological research and conservation planning, and there has been a rapid increase in the scope and volume of data available. However, data from eBird or other large‐scale projects with volunteer observers typically present several challenges that can impede robust ecological inferences. These challenges include spatial bias, variation in effort and species reporting bias. Innovation We use the example of estimating species distributions with data from eBird, a community science or citizen science (CS) project. We estimate two widely used metrics of species distributions: encounter rate and occupancy probability. For each metric, we critically assess the impact of data processing steps that either degrade or refine the data used in the analyses. CS data density varies widely across the globe, so we also test whether differences in model performance are robust to sample size. Main conclusions Model performance improved when data processing and analytical methods addressed the challenges arising from CS data; however, the degree of improvement varied with species and data density. The largest gains we observed in model performance were achieved with 1) the use of complete checklists (where observers report all the species they detect and identify, allowing non‐detections to be inferred) and 2) the use of covariates describing variation in effort and detectability for each checklist. Occupancy models were more robust to a lack of complete checklists. Improvements in model performance with data refinement were more evident with larger sample sizes. In general, we found that the value of each refinement varied by situation and we encourage researchers to assess the benefits in other scenarios. These approaches will enable researchers to more effectively harness the vast ecological knowledge that exists within CS data for conservation and basic research.
Avian migration is one of Earth's largest processes of biomass transport, involving billions of birds. We estimated continental biomass flows of nocturnal avian migrants across the contiguous United States using a network of 143 weather radars. We show that, relative to biomass leaving in autumn, proportionally more biomass returned in spring across the southern United States than across the northern United States. Neotropical migrants apparently achieved higher survival during the combined migration and non-breeding period, despite an average three- to fourfold longer migration distance, compared with a more northern assemblage of mostly temperate-wintering migrants. Additional mortality expected with longer migration distances was probably offset by high survival in the (sub)tropics. Nearctic-Neotropical migrants relying on a 'higher survivorship' life-history strategy may be particularly sensitive to variations in survival on the overwintering grounds, highlighting the need to identify and conserve important non-breeding habitats.
Aim:To improve the accuracy of inferences on habitat associations and distribution patterns of rare species by combining machine-learning, spatial filtering and resampling to address class imbalance and spatial bias of large volumes of citizen science data. Innovation:Modelling rare species' distributions is a pressing challenge for conservation and applied research. Often, a large number of surveys are required before enough detections occur to model distributions of rare species accurately, resulting in a data set with a high proportion of non-detections (i.e. class imbalance). Citizen science data can provide a cost-effective source of surveys but likely suffer from class imbalance.Citizen science data also suffer from spatial bias, likely from preferential sampling. To correct for class imbalance and spatial bias, we used spatial filtering to under-sample the majority class (non-detection) while maintaining all of the limited information from the minority class (detection). We investigated the use of spatial under-sampling with randomForest models and compared it to common approaches used for imbalanced data, the synthetic minority oversampling technique (SMOTE), weighted random forest and balanced random forest models. Model accuracy was assessed using kappa, Brier score and AUC. We demonstrate the method by evaluating habitat associations and seasonal distribution patterns using citizen science data for a rare species, the tricoloured blackbird (Agelaius tricolor).Main Conclusions: Spatial under-sampling increased the accuracy of each model and outperformed the approach typically used to direct under-sampling in the SMOTE algorithm. Our approach is the first to characterize winter distribution and movement of tricoloured blackbirds. Our results show that tricoloured blackbirds are positively associated with grassland, pasture and wetland habitats, and negatively associated with high elevations or evergreen forests during both winter and breeding seasons. The seasonal differences in distribution indicate that individuals move to the coast during the winter, as suggested by historical accounts. K E Y W O R D Scitizen science, class imbalance, random forest, spatial bias, species distribution model, tricoloured blackbird | 461 ROBINSON et al.
AimInformation on species’ habitat associations and distributions, across a wide range of spatial and temporal scales, is a fundamental source of ecological knowledge. However, collecting information at relevant scales is often cost prohibitive, although it is essential for framing the broader context of more focused research and conservation efforts. Citizen science has been signalled as an increasingly important source to fill in data gaps where information is needed to make comprehensive and robust inferences on species distributions. However, there are perceived trade‐offs of combining highly structured, scientific survey data with largely un‐structured, citizen science data.MethodsWe explore these trade‐offs by applying a simplified approach of filtering citizen science data to resemble structured survey data and analyse both sources of data under a common framework. To accomplish this, we integrated high‐resolution survey data on shorebirds in the northern Central Valley of California with observations in eBird for the entire region that were filtered to improve their quality.ResultsThe integration of survey data with the filtered citizen science data resulted in improved inference and increased the extent and accuracy of distribution models on shorebirds for the Central Valley. The structured surveys improved the overall accuracy of ecological inference over models using citizen science data only by increasing the representation of data collected from high‐quality habitats for shorebirds.Main conclusionsThe practical approach we have shown for data integration can also be used to improve the efficiency of designing biological surveys in the context of larger, citizen science monitoring efforts, ultimately reducing the financial and time expenditures typically required of monitoring programs and focused research. The simple method we present can be used to integrate other types of data with more localized efforts, ultimately improving our ecological knowledge on the distribution and habitat associations of species of conservation concern worldwide.
No abstract
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.