Species delimitation is the act of identifying species-level biological diversity. In recent years, the field has witnessed a dramatic increase in the number of methods available for delimiting species. However, most recent investigations only utilize a handful (i.e. 2-3) of the available methods, often for unstated reasons. Because the parameter space that is potentially relevant to species delimitation far exceeds the parameterization of any existing method, a given method necessarily makes a number of simplifying assumptions, any one of which could be violated in a particular system. We suggest that researchers should apply a wide range of species delimitation analyses to their data and place their trust in delimitations that are congruent across methods. Incongruence across the results from different methods is evidence of either a difference in the power to detect cryptic lineages across one or more of the approaches used to delimit species and could indicate that assumptions of one or more of the methods have been violated. In either case, the inferences drawn from species delimitation studies should be conservative, for in most contexts it is better to fail to delimit species than it is to falsely delimit entities that do not represent actual evolutionary lineages.
The conservation status of most plant species is currently unknown, despite the fundamental role of plants in ecosystem health. To facilitate the costly process of conservation assessment, we developed a predictive protocol using a machine-learning approach to predict conservation status of over 150,000 land plant species. Our study uses open-source geographic, environmental, and morphological trait data, making this the largest assessment of conservation risk to date and the only global assessment for plants. Our results indicate that a large number of unassessed species are likely at risk and identify several geographic regions with the highest need of conservation efforts, many of which are not currently recognized as regions of global concern. By providing conservation-relevant predictions at multiple spatial and taxonomic scales, predictive frameworks such as the one developed here fill a pressing need for biodiversity science.
Empirical phylogeographic studies have progressively sampled greater numbers of loci over time, in part motivated by theoretical papers showing that estimates of key demographic parameters improve as the number of loci increases. Recently, next-generation sequencing has been applied to questions about organismal history, with the promise of revolutionizing the field. However, no systematic assessment of how phylogeographic data sets have changed over time with respect to overall size and information content has been performed. Here, we quantify the changing nature of these genetic data sets over the past 20 years, focusing on papers published in Molecular Ecology. We found that the number of independent loci, the total number of alleles sampled and the total number of single nucleotide polymorphisms (SNPs) per data set has improved over time, with particularly dramatic increases within the past 5 years. Interestingly, uniparentally inherited organellar markers (e.g. animal mitochondrial and plant chloroplast DNA) continue to represent an important component of phylogeographic data. Singlespecies studies (cf. comparative studies) that focus on vertebrates (particularly fish and to some extent, birds) represent the gold standard of phylogeographic data collection. Based on the current trajectory seen in our survey data, forecast modelling indicates that the median number of SNPs per data set for studies published by the end of the year 2016 may approach~20 000. This survey provides baseline information for understanding the evolution of phylogeographic data sets and underscores the fact that development of analytical methods for handling very large genetic data sets will be critical for facilitating growth of the field.Keywords: DNA sequences, information content, phylogeography, sampling, single nucleotide polymorphisms, temporal trends IntroductionPhylogeographers have been working to collect multilocus data ever since a series of theoretical papers pertinent to the discipline demonstrated that estimates of key demographic parameters improve as the number of loci increases (e.g. Edwards & Beerli 2000;Hey & Nielsen 2004;Felsenstein 2006;Carling & Brumfield 2007). Recent improvements in DNA sequencing technology have led to platforms with greater speed, resolution and/or output (e.g. Margulies et al. 2005;Bentley et al. 2008;Rothberg et al. 2011) when compared to the traditional Sanger method. These technological advances, together with the development of general-purpose protocols for discovering and screening many DNA sequence polymorphisms arrayed across a species' genome (e.g. Baird et al. 2008;Kerstens et al. 2009;Faircloth et al. 2012;Peterson et al. 2012), are transforming the field of phylogeography to one that is no longer data limited. Investigations concerned with reconstructing long-term population history generally require large numbers of sampled alleles (i.e. many individuals and populations), across multiple loci, to adequately characterize levels of diversity and spatial genetic structuring (McCor...
Model checking is a critical part of Bayesian data analysis, yet it remains largely unused in systematic studies. Phylogeny estimation has recently moved into an era of increasingly complex models that simultaneously account for multiple evolutionary processes, the statistical fit of these models to the data has rarely been tested. Here we develop a posterior predictive simulation-based model check for a commonly used multispecies coalescent model, implemented in *BEAST, and apply it to 25 published data sets. We show that poor model fit is detectable in the majority of data sets; that this poor fit can mislead phylogenetic estimation; and that in some cases it stems from processes of inherent interest to systematists. We suggest that as systematists scale up to phylogenomic data sets, which will be subject to a heterogeneous array of evolutionary processes, critically evaluating the fit of models to data is an analytical step that can no longer be ignored.
Model-based analyses are common in phylogeographic inference because they parameterize processes such as population division, gene flow and expansion that are of interest to biologists. Approximate Bayesian computation is a model-based approach that can be customized to any empirical system and used to calculate the relative posterior probability of several models, provided that suitable models can be identified for comparison. The question of how to identify suitable models is explored using data from Plethodon idahoensis, a salamander that inhabits the North American inland northwest temperate rainforest. First, we conduct an ABC analysis using five models suggested by previous research, calculate the relative posterior probabilities and find that a simple model of population isolation has the best fit to the data (PP=0.70). In contrast to this subjective choice of models to include in the analysis, we also specify models in a more objective manner by simulating prior distributions for 143 models that included panmixia, population isolation, change in effective population size, migration and range expansion. We then identify a smaller subset of models for comparison by generating an expectation of the highest posterior probability that a false model is likely to achieve due to chance and calculate the relative posterior probabilities of only those models that exceed this expected level. A model that parameterized divergence with population expansion and gene flow in one direction offered the best fit to the P. idahoensis data (in contrast to an isolation-only model from the first analysis). Our investigation demonstrates that the determination of which models to include in ABC model choice experiments is a vital component of model-based phylogeographic analysis.
While genetic diversity within species is influenced by both geographical distance and environmental gradients, it is unclear what other factors are likely to promote population genetic structure. Using a machine learning framework and georeferenced DNA sequences from more than 8000 species, we demonstrate that geographical attributes of the species range, including total size, latitude and elevation, are the most important predictors of which species are likely to contain structured genetic variation. While latitude is well known as an important predictor of biodiversity, our work suggests that it also plays a key role in shaping diversity within species.
Significance Only an estimated 1 to 10% of Earth’s species have been formally described. This discrepancy between the number of species with a formal taxonomic description and actual number of species (i.e., the Linnean shortfall) hampers research across the biological sciences. To explore whether the Linnean shortfall results from poor taxonomic practice or not enough taxonomic effort, we applied machine-learning techniques to build a predictive model to identify named species that are likely to contain hidden diversity. Results indicate that small-bodied species with large, climatically variable ranges are most likely to contain hidden species. These attributes generally match those identified in the taxonomic literature, indicating that the Linnean shortfall is caused by societal underinvestment in taxonomy rather than poor taxonomic practice.
Allopatry is commonly used to predict boundaries in species delimitation investigations under the assumption that currently allopatric distributions are indicative of reproductive isolation; however, species ranges are known to change over time. Incorporating a temporal perspective of geographic distributions should improve species delimitation; to explore this, we investigate three species of western Plethodon salamanders that have shifted their ranges since the end of the Pleistocene. We generate species distribution models (SDM) of the current range, hindcast these models onto a climatic model 21 Ka, and use three molecular approaches to delimit species in an integrated fashion. In contrast to expectations based on the current distribution, we detect no independent lineages in species with allopatric and patchy distributions (Plethodon vandykei and Plethodon larselli). The SDMs indicate that probable habitat is more expansive than their current range, especially during the last glacial maximum (LGM) (21 Ka). However, with a contiguous distribution, two independent lineages were detected in Plethodon idahoensis, possibly due to isolation in multiple glacial refugia. Results indicate that historical SDMs are a better predictor of species boundaries than current distributions, and strongly imply that researchers should incorporate SDM and hindcasting into their investigations and the development of species hypotheses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.