High-throughput sequencing is helping biologists to overcome the difficulties of inferring the phylogenies of recently diverged taxa. The present study analyzes the phylogenetic signal of genomic regions with different inheritance patterns using genome skimming and ddRAD-seq in a species-rich Andean genus (Diplostephium) and its allies. We analyzed the complete nuclear ribosomal cistron, the complete chloroplast genome, a partial mitochondrial genome, and a nuclear-ddRAD matrix separately with phylogenetic methods. We applied several approaches to understand the causes of incongruence among datasets, including simulations and the detection of introgression using the D-statistic (ABBA-BABA test). We found significant incongruence among the nuclear, chloroplast, and mitochondrial phylogenies. The strong signal of hybridization found by simulations and the D-statistic among genera and inside the main clades of Diplostephium indicate reticulate evolution as a main cause of phylogenetic incongruence. Our results add evidence for a major role of reticulate evolution in events of rapid diversification. Hybridization and introgression confound chloroplast and mitochondrial phylogenies in relation to the species tree as a result of the uniparental inheritance of these genomic regions. Practical implications regarding the prevalence of hybridization are discussed in relation to the phylogenetic method.
Evolutionary relationships among plants have been inferred primarily using chloroplast data. To date, no study has comprehensively examined the plastome for gene tree conflict. Using a broad sampling of angiosperm plastomes, we characterize gene tree conflict among plastid genes at various time scales and explore correlates to conflict (e.g., evolutionary rate, gene length, molecule type). We uncover notable gene tree conflict against a backdrop of largely uninformative genes. We find alignment length and tree length are strong predictors of concordance, and that nucleotides outperform amino acids. Of the most commonly used markers, matK, greatly outperforms rbcL; however, the rarely used gene rpoC2 is the top-performing gene in every analysis. We find that rpoC2 reconstructs angiosperm phylogeny as well as the entire concatenated set of protein-coding chloroplast genes. Our results suggest that longer genes are superior for phylogeny reconstruction. The alleviation of some conflict through the use of nucleotides suggests that stochastic and systematic error is likely the root of most of the observed conflict, but further research on biological conflict within plastome is warranted given documented cases of heteroplasmic recombination. We suggest that researchers should filter genes for topological concordance when performing downstream comparative analyses on phylogenetic data, even when using chloroplast genomes.
Premise
Large genomic data sets offer the promise of resolving historically recalcitrant species relationships. However, different methodologies can yield conflicting results, especially when clades have experienced ancient, rapid diversification. Here, we analyzed the ancient radiation of Ericales and explored sources of uncertainty related to species tree inference, conflicting gene tree signal, and the inferred placement of gene and genome duplications.
Methods
We used a hierarchical clustering approach, with tree‐based homology and orthology detection, to generate six filtered phylogenomic matrices consisting of data from 97 transcriptomes and genomes. Support for species relationships was inferred from multiple lines of evidence including shared gene duplications, gene tree conflict, gene‐wise edge‐based analyses, concatenation, and coalescent‐based methods, and is summarized in a consensus framework.
Results
Our consensus approach supported a topology largely concordant with previous studies, but suggests that the data are not capable of resolving several ancient relationships because of lack of informative characters, sensitivity to methodology, and extensive gene tree conflict correlated with paleopolyploidy. We found evidence of a whole‐genome duplication before the radiation of all or most ericalean families, and demonstrate that tree topology and heterogeneous evolutionary rates affect the inferred placement of genome duplications.
Conclusions
We provide several hypotheses regarding the history of Ericales, and confidently resolve most nodes, but demonstrate that a series of ancient divergences are unresolvable with these data. Whether paleopolyploidy is a major source of the observed phylogenetic conflict warrants further investigation.
A B S T R A C TReconstructing species trees from multi-loci datasets is becoming a standard practice in phylogenetics. Nevertheless, access to high-throughput sequencing may be costly, especially with studies of many samples. The potential high cost makes a priori assessments desirable in order to make informed decisions about sequencing. We generated twelve transcriptomes for ten species of the Brazil nut family (Lecythidaceae), identified a set of putatively orthologous nuclear loci and evaluated, in silico, their phylogenetic utility using genome skimming data of 24 species. We designed the markers using MarkerMiner, and developed a script, GoldFinder, to efficiently sub-select the best makers for sequencing. We captured, in silico, all designed 354 nuclear loci and performed a maximum likelihood phylogenetic analysis on the concatenated sequence matrix. We also calculated individual gene trees with maximum likelihood and used them for a coalescent-based species tree inference. Both analyses resulted in almost identical topologies. However, our nuclear-loci phylogenies were strongly incongruent with a published plastome phylogeny, suggesting that plastome data alone is not sufficient for species tree estimation. Our results suggest that using hundreds of nuclear markers (i.e. 354) will significantly improve the Lecythidaceae species tree. The framework described here will be useful, generally, for developing markers for species tree inference.
High species richness and endemism in tropical mountains are recognized as major contributors to the latitudinal diversity gradient. The processes underlying mountain speciation, however, are largely untested. The prevalence of steep ecogeographic gradients and the geographic isolation of populations by topographic features are predicted to promote speciation in mountains. We evaluate these processes in a species-rich Neotropical genus of understory herbs that range from the lowlands to montane forests and have higher species richness in topographically complex regions. We ask whether climatic niche divergence, geographic isolation, and pollination shifts differ between mountain-influenced and lowland Amazonian sister pairs inferred from a 756-gene phylogeny. Neotropical Costus ancestors diverged in Central America during a period of mountain formation in the last 3 million years with later colonization of Amazonia. Although climatic divergence, geographic isolation, and pollination shifts are prevalent in general, these factors do not differ between mountain-influenced and Amazonian sister pairs. Despite higher climatic niche and species diversity in the mountains, speciation modes in Costus appear similar across regions. Thus, greater species richness in tropical mountains may reflect differences in colonization history, diversification rates, or the prevalence of rapidly evolving plant life forms, rather than differences in speciation mode.
Puttick (2017, 20162290 (doi:10.1098/rspb.2016.2290)) performed a simulation study to compare accuracy among methods of inferring phylogeny from discrete morphological characters. They report that a Bayesian implementation of the Mk model (Lewis 2001 , 913-925 (doi:10.1080/106351501753462876)) was most accurate (but with low resolution), while a maximum-likelihood (ML) implementation of the same model was least accurate. They conclude by strongly advocating that Bayesian implementations of the Mk model should be the default method of analysis for such data. While we appreciate the authors' attempt to investigate the accuracy of alternative methods of analysis, their conclusion is based on an inappropriate comparison of the ML point estimate, which does not consider confidence, with the Bayesian consensus, which incorporates estimation credibility into the summary tree. Using simulation, we demonstrate that ML and Bayesian estimates are concordant when confidence and credibility are comparably reflected in summary trees, a result expected from statistical theory. We therefore disagree with the conclusions of Puttick and consider their prescription of any default method to be poorly founded. Instead, we recommend caution and thoughtful consideration of the model or method being applied to a morphological dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.