Software is written in Python and freely available at http://www.dereneaton.com/software/.
Phylogenetic relationships among recently diverged species are often difficult to resolve due to insufficient phylogenetic signal in available markers and/or conflict among gene trees. Here we explore the use of reduced-representation genome sequencing, specifically in the form of restriction-site associated DNA (RAD), for phylogenetic inference and the detection of ancestral hybridization in non-model organisms. As a case study, we investigate Pedicularis section Cyathophora, a systematically recalcitrant clade of flowering plants in the broomrape family (Orobanchaceae). Two methods of phylogenetic inference, maximum likelihood and Bayesian concordance, were applied to data sets that included as many as 40,000 RAD loci. Both methods yielded similar topologies that included two major clades: a “rex-thamnophila” clade, composed of two species and several subspecies with relatively low floral diversity, and geographically widespread distributions at lower elevations, and a “superba” clade, composed of three species characterized by relatively high floral diversity and isolated geographic distributions at higher elevations. Levels of molecular divergence between subspecies in the rex-thamnophila clade are similar to those between species in the superba clade. Using Patterson’s D-statistic test, including a novel extension of the method that enables finer-grained resolution of introgression among multiple candidate taxa by removing the effect of their shared ancestry, we detect significant introgression among nearly all taxa in the rex-thamnophila clade, but not between clades or among taxa within the superba clade. These results suggest an important role for geographic isolation in the emergence of species barriers, by facilitating local adaptation and differentiation in the absence of homogenizing gene flow. [Concordance factors; genotyping-by-sequencing; hybridization; partitioned D-statistic test; Pedicularis; restriction-site associated DNA.]
Summary ipyrad is a free and open source tool for assembling and analyzing restriction site-associated DNA sequence datasets using de novo and/or reference-based approaches. It is designed to be massively scalable to hundreds of taxa and thousands of samples, and can be efficiently parallelized on high performance computing clusters. It is available both as a command line interface and as a Python package with an application programming interface, the latter of which can be used interactively to write complex, reproducible scripts and implement a suite of downstream analysis tools. Availability and implementation ipyrad is a free and open source program written in Python. Source code is available from the GitHub repository (https://github.com/dereneaton/ipyrad/), and Linux and MacOS installs are distributed through the conda package manager. Complete documentation, including numerous tutorials, and Jupyter notebooks demonstrating example assemblies and applications of downstream analysis tools are available online: https://ipyrad.readthedocs.io/.
Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10X the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies. [hierarchical redundancy; phylogenetic informativeness; quartet informativeness; Restriction-site associated DNA (RAD) sequencing; sequencing coverage; Viburnum.].
Previous phylogenetic studies in oaks (Quercus, Fagaceae) have failed to resolve the backbone topology of the genus with strong support. Here, we utilize next-generation sequencing of restriction-site associated DNA (RAD-Seq) to resolve a framework phylogeny of a predominantly American clade of oaks whose crown age is estimated at 23–33 million years old. Using a recently developed analytical pipeline for RAD-Seq phylogenetics, we created a concatenated matrix of 1.40 E06 aligned nucleotides, constituting 27,727 sequence clusters. RAD-Seq data were readily combined across runs, with no difference in phylogenetic placement between technical replicates, which overlapped by only 43–64% in locus coverage. 17% (4,715) of the loci we analyzed could be mapped with high confidence to one or more expressed sequence tags in NCBI Genbank. A concatenated matrix of the loci that BLAST to at least one EST sequence provides approximately half as many variable or parsimony-informative characters as equal-sized datasets from the non-EST loci. The EST-associated matrix is more complete (fewer missing loci) and has slightly lower homoplasy than non-EST subsampled matrices of the same size, but there is no difference in phylogenetic support or relative attribution of base substitutions to internal versus terminal branches of the phylogeny. We introduce a partitioned RAD visualization method (implemented in the R package RADami; http://cran.r-project.org/web/packages/RADami) to investigate the possibility that suboptimal topologies supported by large numbers of loci—due, for example, to reticulate evolution or lineage sorting—are masked by the globally optimal tree. We find no evidence for strongly-supported alternative topologies in our study, suggesting that the phylogeny we recover is a robust estimate of large-scale phylogenetic patterns in the American oak clade. Our study is one of the first to demonstrate the utility of RAD-Seq data for inferring phylogeny in a 23–33 million year-old clade.
Introgressive hybridization challenges the concepts we use to define species and infer phylogenetic relationships. Methods for inferring historical introgression from the genomes of extant species, such as ABBA-BABA tests, are widely used, however, their results can be easily misinterpreted. Because these tests are inherently comparative, they are sensitive to the effects of missing data (unsampled species) and nonindependence (hierarchical relationships among species). We demonstrate this using genomic RADseq data sampled from all extant species in the American live oaks (Quercus series Virentes), a group notorious for hybridization. By considering all species and their phylogenetic relationships, we were able to distinguish true hybridizing lineages from those that falsely appear admixed. Six of seven species show evidence of admixture, often with multiple other species, but which is explained by introgression among a few related lineages occurring in close proximity. We identify the Cuban oak as the most admixed lineage and test alternative scenarios for its origin. The live oaks form a continuous ring-like distribution around the Gulf of Mexico, connected in Cuba, across which they could effectively exchange alleles. However, introgression appears highly localized, suggesting that oak species boundaries and their geographic ranges have remained relatively stable over evolutionary time.
The nature and timing of evolution of niche differentiation among closely related species remains an important question in ecology and evolution. The American live oak clade, Virentes, which spans the unglaciated temperate and tropical regions of North America and Mesoamerica, provides an instructive system in which to examine speciation and niche evolution. We generated a fossil-calibrated phylogeny of Virentes using RADseq data to estimate divergence times and used nuclear microsatellites, chloroplast sequences and an intron region of nitrate reductase (NIA-i3) to examine genetic diversity within species, rates of gene flow among species and ancestral population size of disjunct sister species. Transitions in functional and morphological traits associated with ecological and climatic niche axes were examined across the phylogeny. We found the Virentes to be monophyletic with three subclades, including a southwest clade, a southeastern US clade and a Central American/Cuban clade. Despite high leaf morphological variation within species and transpecific chloroplast haplotypes, RADseq and nuclear SSR data showed genetic coherence of species. We estimated a crown date for Virentes of 11 Ma and implicated the formation of the Sea of Cort es in a speciation event~5 Ma. Tree height at maturity, associated with fire tolerance, differs among the sympatric species, while freezing tolerance appears to have diverged repeatedly across the tropical-temperate divide. Sympatric species thus show evidence of ecological niche differentiation but share climatic niches, while allopatric and parapatric species conserve ecological niches, but diverge in climatic niches. The mode of speciation and/or degree of co-occurrence may thus influence which niche axis plants diverge along.
The past decade has seen a major breakthrough in our ability to easily and inexpensively sequence genome‐scale data from diverse lineages. The development of high‐throughput sequencing and long‐read technologies has ushered in the era of phylogenomics, where hundreds to thousands of nuclear genes and whole organellar genomes are routinely used to reconstruct evolutionary relationships. As a result, understanding which options are best suited for a particular set of questions can be difficult, especially for those just starting in the field. Here, we review the most recent advances in plant phylogenomic methods and make recommendations for project‐dependent best practices and considerations. We focus on the costs and benefits of different approaches in regard to the information they provide researchers and the questions they can address. We also highlight unique challenges and opportunities in plant systems, such as polyploidy, reticulate evolution, and the use of herbarium materials, identifying optimal methodologies for each. Finally, we draw attention to lingering challenges in the field of plant phylogenomics, such as reusability of data sets, and look at some up‐and‐coming technologies that may help propel the field even further.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.