Abstract:BackgroundIdentifying orthologous molecular markers that potentially resolve relationships at and below species level has been a major challenge in molecular phylogenetics over the past decade. Non-coding regions of nuclear low- or single-copy markers are a vast and promising source of data providing information for shallow-scale phylogenetics. Taking advantage of public transcriptome data from the One Thousand Plant Project (1KP), we developed a genome-scale mining strategy for recovering potentially ortholog… Show more
“…The two next-best methods are the signal-and-noise analysis ( Townsend et al 2012 ) which is very similar in its requirements to PR, and Fisher information ( Goldman 1998 ). The signal-noise analysis has been used recently for experimental design in phylogenomic studies at shallow and deep levels of divergence ( Mendoza et al 2015 ; Prum et al 2015 ) and given our results certainly is a promising approach. Although statistically very sound, Fisher information is computationally prohibitive for trees of more than about 10 taxa and requires specification of a full model tree.…”
The accumulation of genome-scale molecular data sets for nonmodel taxa brings us ever closer to resolving the tree of life of all living organisms. However, despite the depth of data available, a number of studies that each used thousands of genes have reported conflicting results. The focus of phylogenomic projects must thus shift to more careful experimental design. Even though we still have a limited understanding of what are the best predictors of the phylogenetic informativeness of a gene, there is wide agreement that one key factor is its evolutionary rate; but there is no consensus as to whether the rates derived as optimal in various analytical, empirical, and simulation approaches have any general applicability. We here use simulations to infer optimal rates in a set of realistic phylogenetic scenarios with varying tree sizes, numbers of terminals, and tree shapes. Furthermore, we study the relationship between the optimal rate and rate variation among sites and among lineages. Finally, we examine how well the predictions made by a range of experimental design methods correlate with the observed performance in our simulations.We find that the optimal level of divergence is surprisingly robust to differences in taxon sampling and even to among-site and among-lineage rate variation as often encountered in empirical data sets. This finding encourages the use of methods that rely on a single optimal rate to predict a gene’s utility. Focusing on correct recovery either of the most basal node in the phylogeny or of the entire topology, the optimal rate is about 0.45 substitutions from root to tip in average Yule trees and about 0.2 in difficult trees with short basal and long-apical branches, but all rates leading to divergence levels between about 0.1 and 0.5 perform reasonably well.Testing the performance of six methods that can be used to predict a gene’s utility against our simulation results, we find that the probability of resolution, signal-noise analysis, and Fisher information are good predictors of phylogenetic informativeness, but they require specification of at least part of a model tree. Likelihood quartet mapping also shows very good performance but only requires sequence alignments and is thus applicable without making assumptions about the phylogeny. Despite them being the most commonly used methods for experimental design, geometric quartet mapping and the integration of phylogenetic informativeness curves perform rather poorly in our comparison. Instead of derived predictors of phylogenetic informativeness, we suggest that the number of sites in a gene that evolve at near-optimal rates (as inferred here) could be used directly to prioritize genes for phylogenetic inference. In combination with measures of model fit, especially with respect to compositional biases and among-site and among-lineage rate variation, such an approach has the potential to greatly improve marker choice and should be tested on empirical data.
“…The two next-best methods are the signal-and-noise analysis ( Townsend et al 2012 ) which is very similar in its requirements to PR, and Fisher information ( Goldman 1998 ). The signal-noise analysis has been used recently for experimental design in phylogenomic studies at shallow and deep levels of divergence ( Mendoza et al 2015 ; Prum et al 2015 ) and given our results certainly is a promising approach. Although statistically very sound, Fisher information is computationally prohibitive for trees of more than about 10 taxa and requires specification of a full model tree.…”
The accumulation of genome-scale molecular data sets for nonmodel taxa brings us ever closer to resolving the tree of life of all living organisms. However, despite the depth of data available, a number of studies that each used thousands of genes have reported conflicting results. The focus of phylogenomic projects must thus shift to more careful experimental design. Even though we still have a limited understanding of what are the best predictors of the phylogenetic informativeness of a gene, there is wide agreement that one key factor is its evolutionary rate; but there is no consensus as to whether the rates derived as optimal in various analytical, empirical, and simulation approaches have any general applicability. We here use simulations to infer optimal rates in a set of realistic phylogenetic scenarios with varying tree sizes, numbers of terminals, and tree shapes. Furthermore, we study the relationship between the optimal rate and rate variation among sites and among lineages. Finally, we examine how well the predictions made by a range of experimental design methods correlate with the observed performance in our simulations.We find that the optimal level of divergence is surprisingly robust to differences in taxon sampling and even to among-site and among-lineage rate variation as often encountered in empirical data sets. This finding encourages the use of methods that rely on a single optimal rate to predict a gene’s utility. Focusing on correct recovery either of the most basal node in the phylogeny or of the entire topology, the optimal rate is about 0.45 substitutions from root to tip in average Yule trees and about 0.2 in difficult trees with short basal and long-apical branches, but all rates leading to divergence levels between about 0.1 and 0.5 perform reasonably well.Testing the performance of six methods that can be used to predict a gene’s utility against our simulation results, we find that the probability of resolution, signal-noise analysis, and Fisher information are good predictors of phylogenetic informativeness, but they require specification of at least part of a model tree. Likelihood quartet mapping also shows very good performance but only requires sequence alignments and is thus applicable without making assumptions about the phylogeny. Despite them being the most commonly used methods for experimental design, geometric quartet mapping and the integration of phylogenetic informativeness curves perform rather poorly in our comparison. Instead of derived predictors of phylogenetic informativeness, we suggest that the number of sites in a gene that evolve at near-optimal rates (as inferred here) could be used directly to prioritize genes for phylogenetic inference. In combination with measures of model fit, especially with respect to compositional biases and among-site and among-lineage rate variation, such an approach has the potential to greatly improve marker choice and should be tested on empirical data.
“…We set the smoothing parameter to 0, allowing the full range of rate variation among branches. As there are no fossils recorded for Peperomia , we decided to set the tree height to 1 to avoid temporal bias (e.g., Granados Mendoza et al, 2015 ). All trees were subsequently rescaled assigning branch tips to time 0 and root to time 1.…”
The species-rich genus Peperomia (Black Pepper relatives) is the only genus among early diverging angiosperms where epiphytism evolved. The majority of fruits of Peperomia release sticky secretions or exhibit hook-shaped appendages indicative of epizoochorous dispersal, which is in contrast to other flowering plants, where epiphytes are generally characterized by fruit morphological adaptations for anemochory or endozoochory. We investigate fruit characters using Cryo-SEM. Comparative phylogenetic analyses are applied for the first time to include life form and fruit character information to study diversification in Peperomia. Likelihood ratio tests uncover correlated character evolution. We demonstrate that diversification within Peperomia is not homogenous across its phylogeny, and that net diversification rates increase by twofold within the most species-rich subgenus. In contrast to former land plant studies that provide general evidence for increased diversification in epiphytic lineages, we demonstrate that the evolution of epiphytism within Peperomia predates the diversification shift. An epiphytic-dependent diversification is only observed for the background phylogeny. An elevated frequency of life form transitions between epiphytes and terrestrials and thus evolutionary flexibility of life forms is uncovered to coincide with the diversification shift. The evolution of fruits showing dispersal related structures is key to diversification in the foreground region of the phylogeny and postdates the evolution of epiphytism. We conclude that the success of Peperomia, measured in species numbers, is likely the result of enhanced vertical and horizontal dispersal ability and life form flexibility but not the evolution of epiphytism itself.
“…These include the conserved ortholog set (COSII) in euasterids (Wu et al., ), shared single‐copy nuclear genes (APVO SSC genes) in angiosperms (Duarte et al., ), the pentatricopeptide repeat (PPR) gene family in angiosperms (Yuan et al., ), other low‐copy nuclear genes conserved across angiosperms (Zhang et al., ), and universal markers developed for individual families (e.g., Chapman et al., ; Curto et al., ). The utility of these general locus sets in comparison with taxon‐specific locus sets in targeted sequence capture and phylogenomics has not been evaluated (but see Granados Mendoza et al., ; Buddenhagen et al., ; Léveillé‐Bourret et al., ).…”
Premise of the StudyTargeted sequence capture can be used to efficiently gather sequence data for large numbers of loci, such as single‐copy nuclear loci. Most published studies in plants have used taxon‐specific locus sets developed individually for a clade using multiple genomic and transcriptomic resources. General locus sets can also be developed from loci that have been identified as single‐copy and have orthologs in large clades of plants.MethodsWe identify and compare a taxon‐specific locus set and three general locus sets (conserved ortholog set [COSII], shared single‐copy nuclear [APVO SSC] genes, and pentatricopeptide repeat [PPR] genes) for targeted sequence capture in Buddleja (Scrophulariaceae) and outgroups. We evaluate their performance in terms of assembly success, sequence variability, and resolution and support of inferred phylogenetic trees.ResultsThe taxon‐specific locus set had the most target loci. Assembly success was high for all locus sets in Buddleja samples. For outgroups, general locus sets had greater assembly success. Taxon‐specific and PPR loci had the highest average variability. The taxon‐specific data set produced the best‐supported tree, but all data sets showed improved resolution over previous non‐sequence capture data sets.DiscussionGeneral locus sets can be a useful source of sequence capture targets, especially if multiple genomic resources are not available for a taxon.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.