Summary1. Understanding the dynamics of speciation, extinction and phenotypic evolution is a central challenge in evolutionary biology. Here, we present BAMMtools, an R package for the analysis and visualization of macroevolutionary dynamics on phylogenetic trees. BAMMtools is a companion package to BAMM, an open-source program for reversible-jump MCMC analyses of diversification and trait evolution. 2. Functions in BAMMtools operate directly on output from the BAMM program. The package is oriented towards reconstructing and visualizing changes in evolutionary rates through time and across clades in a Bayesian statistical framework. 3. BAMMtools enables users to extract credible sets of diversification shifts and to identify diversification histories with the maximum a posteriori probability. Users can compare the fit of alternative diversification models using Bayes factors and by directly comparing model posterior probabilities. 4. By providing a robust framework for quantifying uncertainty in macroevolutionary dynamics, BAMMtools will facilitate inference on the complex mixture of processes that have shaped the distribution of species and phenotypes across the tree of life.
There is a lack of consensus on how next-generation sequence (NGS) data should be considered for phylogenetic and phylogeographic estimates, with some studies excluding loci with missing data, whereas others include them, even when sequences are missing from a large number of individuals. Here, we use simulations, focusing specifically on RAD (Restriction site Associated DNA) sequences, to highlight some of the unforeseen consequence of excluding missing data from next-generation sequencing. Specifically, we show that in addition to the obvious effects associated with reducing the amount of data used to make historical inferences, the decisions we make about missing data (such as the minimum number of individuals with a sequence for a locus to be included in the study) also impact the types of loci sampled for a study. In particular, as the tolerance for missing data becomes more stringent, the mutational spectrum represented in the sampled loci becomes truncated such that loci with the highest mutation rates are disproportionately excluded. This effect is exacerbated further by factors involved in the preparation of the genomic library (i.e., the use of reduced representation libraries, as well as the coverage) and the taxonomic diversity represented in the library (i.e., the level of divergence among the individuals). We demonstrate that the intuitive appeals about being conservative by removing loci may be misguided. [Next-generation sequencing; phylogenetic; phylogeography; RADseq; RADtags; species delimitation.].
Diabetic kidney disease, or diabetic nephropathy (DN), is a major complication of diabetes and the leading cause of end-stage renal disease (ESRD) that requires dialysis treatment or kidney transplantation. In addition to the decrease in the quality of life, DN accounts for a large proportion of the excess mortality associated with type 1 diabetes (T1D). Whereas the degree of glycemia plays a pivotal role in DN, a subset of individuals with poorly controlled T1D do not develop DN. Furthermore, strong familial aggregation supports genetic susceptibility to DN. However, the genes and the molecular mechanisms behind the disease remain poorly understood, and current therapeutic strategies rarely result in reversal of DN. In the GEnetics of Nephropathy: an International Effort (GENIE) consortium, we have undertaken a meta-analysis of genome-wide association studies (GWAS) of T1D DN comprising ∼2.4 million single nucleotide polymorphisms (SNPs) imputed in 6,691 individuals. After additional genotyping of 41 top ranked SNPs representing 24 independent signals in 5,873 individuals, combined meta-analysis revealed association of two SNPs with ESRD: rs7583877 in the AFF3 gene (P = 1.2×10−8) and an intergenic SNP on chromosome 15q26 between the genes RGMA and MCTP2, rs12437854 (P = 2.0×10−9). Functional data suggest that AFF3 influences renal tubule fibrosis via the transforming growth factor-beta (TGF-β1) pathway. The strongest association with DN as a primary phenotype was seen for an intronic SNP in the ERBB4 gene (rs7588550, P = 2.1×10−7), a gene with type 2 diabetes DN differential expression and in the same intron as a variant with cis-eQTL expression of ERBB4. All these detected associations represent new signals in the pathogenesis of DN.
Rates of species diversification vary widely across the tree of life and there is considerable interest in identifying organismal traits that correlate with rates of speciation and extinction. However, it has been challenging to develop methodological frameworks for testing hypotheses about trait-dependent diversification that are robust to phylogenetic pseudoreplication and to directionally biased rates of character change. We describe a semi-parametric test for trait-dependent diversification that explicitly requires replicated associations between character states and diversification rates to detect effects. To use the method, diversification rates are reconstructed across a phylogenetic tree with no consideration of character states. A test statistic is then computed to measure the association between species-level traits and the corresponding diversification rate estimates at the tips of the tree. The empirical value of the test statistic is compared to a null distribution that is generated by structured permutations of evolutionary rates across the phylogeny. The test is applicable to binary discrete characters as well as continuous-valued traits and can accommodate extremely sparse sampling of character states at the tips of the tree. We apply the test to several empirical data sets and demonstrate that the method has acceptable Type I error rates.
Discord in the estimated gene trees among loci can be attributed to both the process of mutation and incomplete lineage sorting. Effectively modeling these two sources of variation--mutational and coalescent variance--provides two distinct challenges for phylogenetic studies. Despite extensive investigation on mutational models for gene-tree estimation over the past two decades and recent attention to modeling of the coalescent process for phylogenetic estimation, the effects of these two variances have yet to be evaluated simultaneously. Here, we partition the effects of mutational and coalescent processes on phylogenetic accuracy by comparing the accuracy of species trees estimated from gene trees (i.e., the actual coalescent genealogies) with that of species trees estimated from estimated gene trees (i.e., trees estimated from nucleotide sequences, which contain both coalescent and mutational variance). Not only is there a significant contribution of both mutational and coalescent variance to errors in species-tree estimates, but the relative magnitude of the effects on the accuracy of species-tree estimation also differs systematically depending on 1) the timing of divergence, 2) the sampling design, and 3) the method used for species-tree estimation. These findings explain why using more information contained in gene trees (e.g., topology and branch lengths as opposed to just topology) does not necessarily translate into pronounced gains in accuracy, highlighting the strengths and limits of different methods for species-tree estimation. Differences in accuracy scores between methods for different sampling regimes also emphasize that it would be a mistake to assume more computationally intensive species-tree estimation procedures that will always provide better estimates of species trees. To the contrary, the performance of a method depends not only on the method per se but also on the compatibilities between the input genetic data and the method as determined by the relative impact of mutational and coalescent variance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.