Genome sequence data contain abundant information about genealogical history, but methods for extracting and interpreting this information are not yet fully developed. We analyzed genome sequences for multiple accessions of the selfing plant, Arabidopsis thaliana, with the goal of better understanding its genealogical history. As expected from accessions of the same species, we found much discordance between nuclear gene trees. Nonetheless, we inferred the optimal population tree under the assumption that all discordance is due to incomplete lineage sorting. To cope with the size of the data (many genes and many taxa), our pipeline is based on parallel computing and divides the problem into four-taxon trees. However, just because a population tree can be estimated does not mean that the assumptions of the multispecies coalescent model hold. Therefore, we implemented a new, nonparametric test to evaluate whether a population tree adequately explains the observed quartet frequencies (the frequencies of gene trees with each resolution of each four-taxon set). This test also considers other models: panmixia and a partially resolved population tree, that is, a tree in which some nodes are collapsed into local panmixia. We found that a partially resolved population tree provides the best fit to the data, providing evidence for tree-like structure within A. thaliana, qualitatively similar to what might be expected between different, closely related species. Further, we show that the pattern of deviation from expectations can be used to identify instances of introgression and detect one clear case of reticulation among ecotypes that have come into contact in the United Kingdom. Our study illustrates how we can use genome sequence data to evaluate whether phylogenetic relationships are strictly tree-like or reticulating.
SummaryPolyploidy is common and an important evolutionary factor in most land plant lineages, but it is rare in gymnosperms. Coast redwood (Sequoia sempervirens) is one of just two polyploid conifer species and the only hexaploid. Evidence from fossil guard cell size suggests that polyploidy in Sequoia dates to the Eocene. Numerous hypotheses about the mechanism of polyploidy and parental genome donors have been proposed, based primarily on morphological and cytological data, but it remains unclear how Sequoia became polyploid and why this lineage overcame an apparent gymnosperm barrier to whole-genome duplication (WGD).We sequenced transcriptomes and used phylogenetic inference, Bayesian concordance analysis and paralog age distributions to resolve relationships among gene copies in hexaploid coast redwood and close relatives.Our data show that hexaploidy in coast redwood is best explained by autopolyploidy or, if there was allopolyploidy, it happened within the Californian redwood clade. We found that duplicate genes have more similar sequences than expected, given the age of the inferred polyploidization.Conflict between molecular and fossil estimates of WGD can be explained if diploidization occurred very slowly following polyploidization. We extrapolate from this to suggest that the rarity of polyploidy in gymnosperms may be due to slow diploidization in this clade.
Previous research suggests that Gossypium has undergone a 5‐ to 6‐fold multiplication following its divergence from Theobroma. However, the number of events, or where they occurred in the Malvaceae phylogeny remains unknown. We analyzed transcriptomic and genomic data from representatives of eight of the nine Malvaceae subfamilies. Phylogenetic analysis of nuclear data placed Dombeya (Dombeyoideae) as sister to the rest of Malvadendrina clade, but the plastid DNA tree strongly supported Durio (Helicteroideae) in this position. Intraspecific Ks plots indicated that all sampled taxa, except Theobroma (Byttnerioideae), Corchorus (Grewioideae), and Dombeya (Dombeyoideae), have experienced whole genome multiplications (WGMs). Quartet analysis suggested WGMs were shared by Malvoideae‐Bombacoideae and Sterculioideae‐Tilioideae, but did not resolve whether these are shared with each other or Helicteroideae (Durio). Gene tree reconciliation and Bayesian concordance analysis suggested a complex history. Alternative hypotheses are suggested, each involving two independent autotetraploid and one allopolyploid event. They differ in that one entails an allopolyploid origin for the Durio lineage, whereas the other invokes an allopolyploid origin for Malvoideae‐Bombacoideae. We highlight the need for more genomic information in the Malvaceae and improved methods to resolve complex evolutionary histories that may include allopolyploidy, incomplete lineage sorting, and variable rates of gene and genome evolution.
SummaryPolyploidy is common and an important evolutionary factor in most land plant lineages, but it is rare in gymnosperms. Coast redwood (Sequoia sempervirens) is one of just two polyploid conifer species and the only hexaploid. Evidence from fossil guard cell size suggests that polyploidy in Sequoia dates to the Eocene. Numerous hypotheses about the mechanism of polyploidy and parental genome donors have been proposed, based primarily on morphological and cytological data, but it remains unclear how Sequoia became polyploid and why this lineage overcame an apparent gymnosperm barrier to whole-genome duplication (WGD).We sequenced transcriptomes and used phylogenetic inference, Bayesian concordance analysis and paralog age distributions to resolve relationships among gene copies in hexaploid coast redwood and close relatives.Our data show that hexaploidy in coast redwood is best explained by autopolyploidy or, if there was allopolyploidy, it happened within the Californian redwood clade. We found that duplicate genes have more similar sequences than expected, given the age of the inferred polyploidization.Conflict between molecular and fossil estimates of WGD can be explained if diploidization occurred very slowly following polyploidization. We extrapolate from this to suggest that the rarity of polyploidy in gymnosperms may be due to slow diploidization in this clade.
While there is no doubt among evolutionary biologists that all living species, or merely all living species within a particular group (e.g., animals), share descent from a common ancestor, formal statistical methods for evaluating common ancestry from aligned DNA sequence data have received criticism. One primary criticism is that prior methods take sequence similarity as evidence for common ancestry while ignoring other potential biological causes of similarity, such as functional constraints. We present a new statistical framework to test separate ancestry versus common ancestry that avoids this pitfall. We illustrate the efficacy of our approach using a recently published large molecular alignment to examine common ancestry of all primates (including humans). We find overwhelming evidence against separate ancestry and in favor of common ancestry for orders and families of primates. We also find overwhelming evidence that humans share a common ancestor with other primate species. The novel statistical methods presented here provide formal means to test separate ancestry versus common ancestry from aligned DNA sequence data while accounting for functional constraints that limit nucleotide base usage on a site-by-site basis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.