A statistical model is developed for estimating species richness and accumulation by formulating these community-level attributes as functions of model-based estimators of species occurrence while accounting for imperfect detection of individual species. The model requires a sampling protocol wherein repeated observations are made at a collection of sample locations selected to be representative of the community. This temporal replication provides the data needed to resolve the ambiguity between species absence and nondetection when species are unobserved at sample locations. Estimates of species richness and accumulation are computed for two communities, an avian community and a butterfly community. Our model-based estimates suggest that detection failures in many bird species were attributed to low rates of occurrence, as opposed to simply low rates of detection. We estimate that the avian community contains a substantial number of uncommon species and that species richness greatly exceeds the number of species actually observed in the sample. In fact, predictions of species accumulation suggest that even doubling the number of sample locations would not have revealed all of the species in the community. In contrast, our analysis of the butterfly community suggests that many species are relatively common and that the estimated richness of species in the community is nearly equal to the number of species actually detected in the sample. Our predictions of species accumulation suggest that the number of sample locations actually used in the butterfly survey could have been cut in half and the asymptotic richness of species still would have been attained. Our approach of developing occurrence-based summaries of communities while allowing for imperfect detection of species is broadly applicable and should prove useful in the design and analysis of surveys of biodiversity.
Aim During the past decade ecologists have attempted to estimate the parameters of species distribution models by combining locations of species presence observed in opportunistic surveys with spatially referenced covariates of occurrence. Several statistical models have been proposed for the analysis of presence-only data, but these models have largely ignored the effects of imperfect detection and survey bias. In this paper I describe a model-based approach for the analysis of presence-only data that accounts for errors in the detection of individuals and for biased selection of survey locations.Innovation I develop a hierarchical, statistical model that allows presence-only data to be analysed in conjunction with data acquired independently in planned surveys. One component of the model specifies the spatial distribution of individuals within a bounded, geographic region as a realization of a spatial point process. A second component of the model specifies two kinds of observations, the detection of individuals encountered during opportunistic surveys and the detection of individuals encountered during planned surveys. Main conclusionsUsing mathematical proof and simulation-based comparisons, I demonstrate that biases induced by errors in detection or biased selection of survey locations can be reduced or eliminated by using the hierarchical model to analyse presence-only data in conjunction with counts observed in planned surveys. I show that a relatively small number of high-quality data (from planned surveys) can be used to leverage the information in presence-only observations, which usually have broad spatial coverage but may not be informative of both occurrence and detectability of individuals. Because a variety of sampling protocols can be used in planned surveys, this approach to the analysis of presence-only data is widely applicable. In addition, since the point-process model is formulated at the level of an individual, it can be extended to account for biological interactions between individuals and temporal changes in their spatial distributions.
Multinomial models with unknown index ("sample size") arise in many practical settings. In practice, Bayesian analysis of such models has proved difficult because the dimension of the parameter space is not fixed, being in some cases a function of the unknown index. We describe a data augmentation approach to the analysis of this class of models that provides for a generic and efficient Bayesian implementation. Under this approach, the data are augmented with all-zero detection histories. The resulting augmented dataset is modeled as a zero-inflated version of the complete-data model where an estimable zero-inflation parameter takes the place of the unknown multinomial index. Interestingly, data augmentation can be justified as being equivalent to imposing a discrete uniform prior on the multinomial index. We provide three examples involving estimating the size of an animal population, estimating the number of diabetes cases in a population using the Rasch model, and the motivating example of estimating the number of species in an animal community with latent probabilities of species occurrence and detection.
Summary1. Recent advances in occupancy estimation that adjust for imperfect detection have provided substantial improvements over traditional approaches and are receiving considerable use in applied ecology. To estimate and adjust for detectability, occupancy modelling requires multiple surveys at a site and requires the assumption of 'closure' between surveys, i.e. no changes in occupancy between surveys. Violations of this assumption could bias parameter estimates; however, little work has assessed model sensitivity to violations of this assumption or how commonly such violations occur in nature. 2. We apply a modelling procedure that can test for closure to two avian point-count data sets in Montana and New Hampshire, USA, that exemplify time-scales at which closure is often assumed. These data sets illustrate different sampling designs that allow testing for closure but are currently rarely employed in field investigations. Using a simulation study, we then evaluate the sensitivity of parameter estimates to changes in site occupancy and evaluate a power analysis developed for sampling designs that is aimed at limiting the likelihood of closure. 3. Application of our approach to point-count data indicates that habitats may frequently be open to changes in site occupancy at time-scales typical of many occupancy investigations, with 71% and 100% of species investigated in Montana and New Hampshire respectively, showing violation of closure across time periods of 3 weeks and 8 days respectively. 4. Simulations suggest that models assuming closure are sensitive to changes in occupancy. Power analyses further suggest that the modelling procedure we apply can effectively test for closure. 5. Synthesis and applications. Our demonstration that sites may be open to changes in site occupancy over time-scales typical of many occupancy investigations, combined with the sensitivity of models to violations of the closure assumption, highlights the importance of properly addressing the closure assumption in both sampling designs and analysis. Furthermore, inappropriately applying closed models could have negative consequences when monitoring rare or declining species for conservation and management decisions, because violations of closure typically lead to overestimates of the probability of occurrence.
Understanding and accurately modeling species distributions lies at the heart of many problems in ecology, evolution, and conservation. Multiple sources of data are increasingly available for modeling species distributions, such as data from citizen science programs, atlases, museums, and planned surveys. Yet reliably combining data sources can be challenging because data sources can vary considerably in their design, gradients covered, and potential sampling biases. We review, synthesize, and illustrate recent developments in combining multiple sources of data for species distribution modeling. We identify five ways in which multiple sources of data are typically combined for modeling species distributions. These approaches vary in their ability to accommodate sampling design, bias, and uncertainty when quantifying environmental relationships in species distribution models. Many of the challenges for combining data are solved through the prudent use of integrated species distribution models: models that simultaneously combine different data sources on species locations to quantify environmental relationships for explaining species distribution. We illustrate these approaches using planned survey data on 24 species of birds coupled with opportunistically collected eBird data in the southeastern United States. This example illustrates some of the benefits of data integration, such as increased precision in environmental relationships, greater predictive accuracy, and accounting for sample bias. Yet it also illustrates challenges of combining data sources with vastly different sampling methodologies and amounts of data. We provide one solution to this challenge through the use of weighted joint likelihoods. Weighted joint likelihoods provide a means to emphasize data sources based on different criteria (e.g., sample size), and we find that weighting improves predictions for all species considered. We conclude by providing practical guidance on combining multiple sources of data for modeling species distributions.
Environmental DNA (eDNA) methods are used to detect DNA that is shed into the aquatic environment by cryptic or low density species. Applied in eDNA studies, occupancy models can be used to estimate occurrence and detection probabilities and thereby account for imperfect detection. However, occupancy terminology has been applied inconsistently in eDNA studies, and many have calculated occurrence probabilities while not considering the effects of imperfect detection. Low detection of invasive giant constrictors using visual surveys and traps has hampered the estimation of occupancy and detection estimates needed for population management in southern Florida, USA. Giant constrictor snakes pose a threat to native species and the ecological restoration of the Florida Everglades. To assist with detection, we developed species-specific eDNA assays using quantitative PCR (qPCR) for the Burmese python (Python molurus bivittatus), Northern African python (P. sebae), boa constrictor (Boa constrictor), and the green (Eunectes murinus) and yellow anaconda (E. notaeus). Burmese pythons, Northern African pythons, and boa constrictors are established and reproducing, while the green and yellow anaconda have the potential to become established. We validated the python and boa constrictor assays using laboratory trials and tested all species in 21 field locations distributed in eight southern Florida regions. Burmese python eDNA was detected in 37 of 63 field sampling events; however, the other species were not detected. Although eDNA was heterogeneously distributed in the environment, occupancy models were able to provide the first estimates of detection probabilities, which were greater than 91%. Burmese python eDNA was detected along the leading northern edge of the known population boundary. The development of informative detection tools and eDNA occupancy models can improve conservation efforts in southern Florida and support more extensive studies of invasive constrictors. Generic sampling design and terminology are proposed to standardize and clarify interpretations of eDNA-based occupancy models.
We develop a parameterization of the beta-binomial mixture that provides sensible inferences about the size of a closed population when probabilities of capture or detection vary among individuals. Three classes of mixture models (beta-binomial, logistic-normal, and latent-class) are fitted to recaptures of snowshoe hares for estimating abundance and to counts of bird species for estimating species richness. In both sets of data, rates of detection appear to vary more among individuals (animals or species) than among sampling occasions or locations. The estimates of population size and species richness are sensitive to model-specific assumptions about the latent distribution of individual rates of detection. We demonstrate using simulation experiments that conventional diagnostics for assessing model adequacy, such as deviance, cannot be relied on for selecting classes of mixture models that produce valid inferences about population size. Prior knowledge about sources of individual heterogeneity in detection rates, if available, should be used to help select among classes of mixture models that are to be used for inference.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.