Abstract.-Determining whether speciation and extinction rates depend on the state of a particular character has been of long-standing interest to evolutionary biologists. To assess the effect of a character on diversification rates using likelihood methods requires that we be able to calculate the probability that a group of extant species would have evolved as observed, given a particular model of the character's effect. Here we describe how to calculate this probability for a phylogenetic tree and a two-state (binary) character under a simple model of evolution (the "BiSSE" model, binary-state speciation and extinction). The model involves six parameters, specifying two speciation rates (rate when the lineage is in state 0; rate when in state 1), two extinction rates (when in state 0; when in state 1), and two rates of character state change (from 0 to 1, and from 1 to 0). Using these probability calculations, we can do maximum likelihood inference to estimate the model's parameters and perform hypothesis tests (e.g., is the rate of speciation elevated for one character state over the other?). We demonstrate the application of the method using simulated data with known parameter values. [Birth-death process; branching process; cladogenesis; extinction; key innovation; macroevolution; phylogeny; speciation; speciose; statistical inference.]The pattern of branching of a phylogenetic tree contains information about the processes of speciation and extinction (Nee et al., 1994b;Barraclough and Nee, 2001). For instance, extinction may be revealed by an upturn near the present in a plot of species lineages through time (Nee et al., 1994a). Of special interest is whether phylogenetic trees can be used to demonstrate that certain characteristics of a lineage, such as ecological niche or mating system, affect the rate of speciation or extinction (Mitter et al., 1988;Barraclough et al., 1998;Gittleman and Purvis, 1998). Often used to answer these questions are sister-clade analyses (Mitter et al. 1988; Farrell et al. 1991;Barraclough et al., 1998;Vamosi and Vamosi 2005). For example, Mitter et al. (1988) showed that herbivorous clades of beetles were more speciose than their carnivorous sister clades; this pattern indicates that herbivory confers either a higher speciation and/or a lower extinction rate. Comparison of sister clades is a simple and relatively nonparametric approach (Slowinski and Guyer, 1993;Barraclough et al., 1996) and has had a broad impact on macroevolutionary studies. However, it has some drawbacks that prompt us to explore alternatives. Sisterclade comparisons cannot distinguish differential speciation from differential extinction (Barraclough and Nee, 2001). Also, when the character of interest is a simple categorical variable, clades with mixed states cannot easily participate in the test. Then, the choice of clades can be arbitrary, and information is discarded when collapsing the phylogenetic tree into a set of clade pairs. In principle it should be possible to find a method considering the who...
Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tipsthe Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.phylogeny | taxonomy | tree of life | biodiversity | synthesis T he realization that all organisms on Earth are related by common descent (1) was one of the most profound insights in scientific history. The goal of reconstructing the tree of life is one of the most daunting challenges in biology. The scope of the problem is immense: there are ∼1.8 million named species, and most species have yet to be described (2-4). Despite decades of effort and thousands of phylogenetic studies on diverse clades, we lack a comprehensive tree of life, or even a summary of our current knowledge. One reason for this shortcoming is lack of data. GenBank contains DNA sequences for ∼411,000 species, only 22% of estimated named species. Although some gene regions (e.g., rbcL, 16S, COI) have been widely sequenced across some lineages, they are insufficient for resolving relationships across the entire tree (5). Most recognized species have never been included in a phylogenetic analysis because no appropriate molecular or morphological data have been collected.There is extensive publication of new phylogenies, data, and inference methods, but little attention to synthesis. We therefore focus on constructing, to our knowledge, the first comprehensive tree of life through the integration of published phylogenies with taxonomic information. Phylogenies by systematists with expertise in particular taxa likely represent the best estimates of relationships for individual clades. By focusing on trees instead of raw data, we avoid issues of dataset assembly (6). However, most published phylogenies are available only as jour...
Most phylogenetically based statistical methods for the analysis of quantitative or continuously varying phenotypic traits assume that variation within species is absent or at least negligible, which is unrealistic for many traits. Within-species variation has several components. Differences among populations of the same species may represent either phylogenetic divergence or direct effects of environmental factors that differ among populations (phenotypic plasticity). Within-population variation also contributes to within-species variation and includes sampling variation, instrument-related error, low repeatability caused by fluctuations in behavioral or physiological state, variation related to age, sex, season, or time of day, and individual variation within such categories. Here we develop techniques for analyzing phylogenetically correlated data to include within-species variation, or "measurement error" as it is often termed in the statistical literature. We derive methods for (i) univariate analyses, including measurement of "phylogenetic signal," (ii) correlation and principal components analysis for multiple traits, (iii) multiple regression, and (iv) inference of "functional relations," such as reduced major axis (RMA) regression. The methods are capable of incorporating measurement error that differs for each data point (mean value for a species or population), but they can be modified for special cases in which less is known about measurement error (e.g., when one is willing to assume something about the ratio of measurement error in two traits). We show that failure to incorporate measurement error can lead to both biased and imprecise (unduly uncertain) parameter estimates. Even previous methods that are thought to account for measurement error, such as conventional RMA regression, can be improved by explicitly incorporating measurement error and phylogenetic correlation. We illustrate these methods with examples and simulations and provide Matlab programs.
MetaCyc (https://MetaCyc.org) is a comprehensive reference database of metabolic pathways and enzymes from all domains of life. It contains more than 2570 pathways derived from >54 000 publications, making it the largest curated collection of metabolic pathways. The data in MetaCyc is strictly evidence-based and richly curated, resulting in an encyclopedic reference tool for metabolism. MetaCyc is also used as a knowledge base for generating thousands of organism-specific Pathway/Genome Databases (PGDBs), which are available in the BioCyc (https://BioCyc.org) and other PGDB collections. This article provides an update on the developments in MetaCyc during the past two years, including the expansion of data and addition of new features.
MetaCyc (MetaCyc.org) is a comprehensive reference database of metabolic pathways and enzymes from all domains of life. It contains 2749 pathways derived from more than 60 000 publications, making it the largest curated collection of metabolic pathways. The data in MetaCyc are evidence-based and richly curated, resulting in an encyclopedic reference tool for metabolism. MetaCyc is also used as a knowledge base for generating thousands of organism-specific Pathway/Genome Databases (PGDBs), which are available in BioCyc.org and other genomic portals. This article provides an update on the developments in MetaCyc during September 2017 to August 2019, up to version 23.1. Some of the topics that received intensive curation during this period include cobamides biosynthesis, sterol metabolism, fatty acid biosynthesis, lipid metabolism, carotenoid metabolism, protein glycosylation, antibiotics and cytotoxins biosynthesis, siderophore biosynthesis, bioluminescence, vitamin K metabolism, brominated compound metabolism, plant secondary metabolism and human metabolism. Other additions include modifications to the GlycanBuilder software that enable displaying glycans using symbolic representation, improved graphics and fonts for web displays, improvements in the PathoLogic component of Pathway Tools, and the optional addition of regulatory information to pathway diagrams.
BioCyc.org is a microbial genome Web portal that combines thousands of genomes with additional information inferred by computer programs, imported from other databases and curated from the biomedical literature by biologist curators. BioCyc also provides an extensive range of query tools, visualization services and analysis software. Recent advances in BioCyc include an expansion in the content of BioCyc in terms of both the number of genomes and the types of information available for each genome; an expansion in the amount of curated content within BioCyc; and new developments in the BioCyc software tools including redesigned gene/protein pages and metabolite pages; new search tools; a new sequence-alignment tool; a new tool for visualizing groups of related metabolic pathways; and a facility called SmartTables, which enables biologists to perform analyses that previously would have required a programmer's assistance.
BackgroundThere has been a considerable increase in studies investigating rates of diversification and character evolution, with one of the promising techniques being the BiSSE method (binary state speciation and extinction). This study uses simulations under a variety of different sample sizes (number of tips) and asymmetries of rate (speciation, extinction, character change) to determine BiSSE’s ability to test hypotheses, and investigate whether the method is susceptible to confounding effects.ResultsWe found that the power of the BiSSE method is severely affected by both sample size and high tip ratio bias (one character state dominates among observed tips). Sample size and high tip ratio bias also reduced accuracy and precision of parameter estimation, and resulted in the inability to infer which rate asymmetry caused the excess of a character state. In low tip ratio bias scenarios with appropriate tip sample size, BiSSE accurately estimated the rate asymmetry causing character state excess, avoiding the issue of confounding effects.ConclusionsBased on our findings, we recommend that future studies utilizing BiSSE that have fewer than 300 terminals and/or have datasets where high tip ratio bias is observed (i.e., fewer than 10% of species are of one character state) should be extremely cautious with the interpretation of hypothesis testing results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.