It is now well known that incomplete lineage sorting can cause serious difficulties for phylogenetic inference, but little attention has been paid to methods that attempt to overcome these difficulties by explicitly considering the processes that produce them. Here we explore approaches to phylogenetic inference designed to consider retention and sorting of ancestral polymorphism. We examine how the reconstructability of a species (or population) phylogeny is affected by (a) the number of loci used to estimate the phylogeny and (b) the number of individuals sampled per species. Even in difficult cases with considerable incomplete lineage sorting (times between divergences less than 1 N(e) generations), we found the reconstructed species trees matched the "true" species trees in at least three out of five partitions, as long as a reasonable number of individuals per species were sampled. We also studied the tradeoff between sampling more loci versus more individuals. Although increasing the number of loci gives more accurate trees for a given sampling effort with deeper species trees (e.g., total depth of 10 N(e) generations), sampling more individuals often gives better results than sampling more loci with shallower species trees (e.g., depth = 1 N(e)). Taken together, these results demonstrate that gene sequences retain enough signal to achieve an accurate estimate of phylogeny despite widespread incomplete lineage sorting. Continued improvement in our methods to reconstruct phylogeny near the species level will require a shift to a compound model that considers not only nucleotide or character state substitutions, but also the population genetics processes of lineage sorting. [Coalescence; divergence; population; speciation.].
While studies of phylogeography and speciation in the past have largely focused on the documentation or detection of significant patterns of population genetic structure, the emerging field of statistical phylogeography aims to infer the history and processes underlying that structure, and to provide objective, rather than ad hoc explanations. Methods for parameter estimation are now commonly used to make inferences about demographic past. Although these approaches are well developed statistically, they typically pay little attention to geographical history. In contrast, methods that seek to reconstruct phylogeographic history are able to consider many alternative geographical scenarios, but are primarily nonstatistical, making inferences about particular biological processes without explicit reference to stochastically derived expectations. We advocate the merging of these two traditions so that statistical phylogeographic methods can provide an accurate representation of the past, consider a diverse array of processes, and yet yield a statistical estimate of that history. We discuss various conceptual issues associated with statistical phylogeographic inferences, considering especially the stochasticity of population genetic processes and assessing the confidence of phylogeographic conclusions. To this end, we present some empirical examples that utilize a statistical phylogeographic approach, and then by contrasting results from a coalescent-based approach to those from Templeton's nested cladistic analysis (NCA), we illustrate the importance of assessing error. Because NCA does not assess error in its inferences about historical processes or contemporary gene flow, we performed a small-scale study using simulated data to examine how our conclusions might be affected by such unconsidered errors. NCA did not identify the processes used to simulate the data, confusing among deterministic processes and the stochastic sorting of gene lineages. There is as yet insufficient justification of NCA's ability to accurately infer or distinguish among alternative processes. We close with a discussion of some unresolved problems of current statistical phylogeographic methods to propose areas in need of future development.
The role of ecology in the origin of species has been the subject of long-standing interest to evolutionary biologists. New sources of spatially explicit ecological data allow for large-scale tests of whether speciation is associated with niche divergence or whether closely related species tend to be similar ecologically (niche conservatism). Because of the confounding effects of spatial autocorrelation of environmental variables, we generate null expectations for niche divergence for both an ecologicalniche modeling and a multivariate approach to address the question: do allopatrically distributed taxa occupy similar niches? In a classic system for the study of niche evolution-the Aphelocoma jays-we show that there is little evidence for niche divergence among Mexican Jay (A. ultramarina) lineages in the process of speciation, contrary to previous results. In contrast, Aphelocoma species that exist in partial sympatry in some regions show evidence for niche divergence. Our approach is widely applicable to the many cases of allopatric lineages in the beginning stages of speciation. These results do not support an ecological speciation model for Mexican Jay lineages because, in most cases, the allopatric environments they occupy are not significantly more divergent than expected under a null model. K E Y W O R D S :Aphelocoma, birds, ecology, niche conservatism, niche modeling, speciation.
Supplementary data are available at Bioinformatics online.
There is a long-standing debate over whether or not the Pleistocene glaciations promoted speciation. While some models predict that extensive mixing of populations during interglacial expansion would have inhibited divergence, others postulate that divergence among allopatric glacial refuges or founder events during recolonization of previously glaciated areas would have promoted differentiation. Using a combination of traditional and coalescent based population genetic approaches, this study finds that the glaciations did not inhibit divergence among populations of the grasshopper Melanoplus oregonensis. Instead, drift associated with recolonization of previously glaciated areas, as well as divergence among multiple allopatric glacial refugia, have both contributed to differentiation in this montane grasshopper from the 'sky islands' of the northern Rocky Mountains. Significant population structure was detected by phylogenetic and FST analyses, including significant FST values among individual pairs of sky-island populations. In addition to clustering of haplotypes within populations, there is some evidence of regional phylogeographic structure, although none of the 'regional groups' form a monophyletic clade and there is a lack of concordance between the genealogical and geographical positions of some haplotypes. However, coalescent simulations confirm there is significant regional phylogeographic structure that most likely reflects divergence among multiple ancestral refugial populations, and indicate that it is very unlikely that the observed gene tree could have been produced by the fragmentation of a single widespread ancestral population. Thus, rather than inhibiting differentiation, the glaciations appear to have promoted population divergence in M. oregonensis, suggesting that they may have contributed to the radiation of Melanoplus species during the Pleistocene.
Statistical phylogeographic studies contribute to our understanding of the factors that influence population divergence and speciation, and that ultimately generate biogeographical patterns. The use of coalescent modelling for analyses of genetic data provides a framework for statistically testing alternative hypotheses about the timing and pattern of divergence. However, the extent to which such approaches contribute to our understanding of biogeography depends on how well the alternative hypotheses chosen capture relevant aspects of species histories. New modelling techniques, which explicitly incorporate spatio‐geographic data external to the gene trees themselves, provide a means for generating realistic phylogeographic hypotheses, even for taxa without a detailed fossil record. Here we illustrate how two such techniques – species distribution modelling and its historical extension, palaeodistribution modelling – in conjunction with coalescent simulations can be used to generate and test alternative hypotheses. In doing so, we highlight a few key studies that have creatively integrated both historical geographic and genetic data and argue for the wider incorporation of such explicit integrations in biogeographical studies.
Estimating phylogenetic relationships among closely related species can be extremely difficult when there is incongruence among gene trees and between the gene trees and the species tree. Here we show that incorporating a model of the stochastic loss of gene lineages by genetic drift into the phylogenetic estimation procedure can provide a robust estimate of species relationships, despite widespread incomplete sorting of ancestral polymorphism. This approach is applied to a group of montane Melanoplus grasshoppers for which genealogical discordance among loci and incomplete lineage sorting obscures any obvious phylogenetic relationships among species. Unlike traditional treatments where gene trees estimated using standard phylogenetic methods are implicitly equated with the species tree, with the coalescent-based approach the species tree is modeled probabilistically from the estimated gene trees. The estimated species phylogeny (the ESP) is calculated for the grasshoppers from multiple gene trees reconstructed for nuclear loci and a mitochondrial gene. This empirical application is coupled with a simulation study to explore the performance of the coalescent-based approach. Specifically, we test the accuracy of the ESP given the data based on analyses of simulated data matching the multilocus data collected in Melanoplus (i.e., data were simulated for each locus with the same number of base pairs and locus-specific mutational models). The results of the study show that ESPs can be computed using the coalescent-based approach long before reciprocal monophyly has been achieved, and that these statistical estimates are accurate. This contrasts with analyses of the empirical data collected in Melanoplus and simulated data based on concatenation of multiple loci, for which the incomplete lineage sorting of recently diverged species posed significant problems. The strengths and potential challenges associated with incorporating an explicit model of gene-lineage coalescence into the phylogenetic procedure to obtain an ESP, as illustrated by application to Melanoplus, versus concatenation and consensus approaches are discussed. This study represents a fundamental shift in how species relationships are estimated - the relationship between the gene trees and the species phylogeny is modeled probabilistically rather than equating gene trees with a species tree.
In the newly emerging field of statistical phylogeography, consideration of the stochastic nature of genetic processes and explicit reference to theoretical expectations under various models has dramatically transformed how historical processes are studied. Rather than being restricted to ad hoc explanations for observed patterns of genetic variation, assessments about the underlying evolutionary processes are now based on statistical tests of various hypotheses, as well as estimates of the parameters specified by the models. A wide range of demographical and biogeographical processes can be accommodated by these new analytical approaches, providing biologically more realistic models. Because of these advances, statistical phylogeography can provide unprecedented insights about a species' history, including decisive information about the factors that shape patterns of genetic variation, species distributions, and speciation. However, to improve our understanding of such processes, a critical examination and appreciation of the inherent difficulties of historical inference and challenges specific to testing phylogeographical hypotheses are essential. As the field of statistical phylogeography continues to take shape many difficulties have been resolved. Nonetheless, careful attention to the complexities of testing historical hypotheses and further theoretical developments are essential to improving the accuracy of our conclusions about a species' history.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.