Inferring an accurate evolutionary tree of life requires high-quality alignments of molecular sequence data sets from large numbers of species. However, this task is often difficult, slow, and idiosyncratic, especially when the sequences are highly diverged or include high rates of insertions and deletions (collectively known as indels). We present SATé (simultaneous alignment and tree estimation), an automated method to quickly and accurately estimate both DNA alignments and trees with the maximum likelihood criterion. In our study, it improved tree and alignment accuracy compared to the best two-phase methods currently available for data sets of up to 1000 sequences, showing that coestimation can be both rapid and accurate in phylogenetic studies.
The origin of a new diploid species by means of hybridization requires the successful merger of differentiated parental species' genomes. To study this process, the genomic composition of three experimentally synthesized hybrid lineages was compared with that of an ancient hybrid species. The genomic composition of the synthesized and ancient hybrids was concordant (rs = 0.68, P < 0.0001), indicating that selection to a large extent governs hybrid species formation. Further, nonrandom rates of introgression and significant associations among unlinked markers in each of the three synthesized hybrid lineages imply that interactions between coadapted parental species' genes constrain the genomic composition of hybrid species.
Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.
Until recently, rigorously reconstructing the many hybrid speciation events in plants has not been practical because of the limited number of molecular markers available for plant phylogenetic reconstruction and the lack of good, biologically based methods for inferring reticulation (network) events. This situation should change rapidly with the development of multiple nuclear markers for phylogenetic reconstruction and new methods for reconstructing reticulate evolution. These developments will necessitate a much greater incorporation of population genetics into phylogenetic reconstruction than has been common. Population genetic events such as gene duplication coupled with lineage sorting and meiotic and sexual recombination have always had the potential to affect phylogenetic inference. For tree reconstruction, these problems are usually minimized by using uniparental markers and nuclear markers that undergo rapid concerted evolution. Because reconstruction of reticulate speciation events will require nuclear markers that lack these characteristics, effects of population genetics on phylogenetic inference will need to be addressed directly. Current models and methods that allow hybrid speciation to be detected and reconstructed are discussed, with a focus on how lineage sorting and meiotic and sexual recombination affect network reconstruction. Approaches that would allow inference of phylogenetic networks in their presence are suggested.
Belowground vertical community composition and maximum rooting depth of the Edwards Plateau of central Texas were determined by using DNA sequence variation to identify roots from caves 5-65 m deep. Roots from caves were identified by comparing their DNA sequences for the internal transcribed spacer (ITS) region of the 18S-26S ribosomal DNA repeat against a reference ITS database developed for woody plants of the region. Sequencing the ITS provides, to our knowledge, the first universal method for identifying plant roots. At least six tree species in the system grew roots deeper than 5 m, but only the evergreen oak, Quercus fusiformis, was found below 10 m. The maximum rooting depth for the ecosystem was Ϸ25 m.18 O isotopic signatures for stem water of Q. fusiformis confirmed water uptake from 18 m underground. The availability of resources at depth, coupled with small surface pools of water and nutrients, may explain the occurrence of deep roots in this and other systems.Plant rooting depth influences the hydrology, biogeochemistry, and primary productivity of terrestrial ecosystems (1-7). Progress in determining the maximum rooting depth of species and in identifying the resources taken up at depth is limited by several factors. Access to the soil is difficult, particularly in rocky soils and in deeper layers. In addition, no universal method exists for identifying roots obtained from the soil, especially when only fine roots are available (8-10). There is considerable variation in maximum rooting depth and root biomass distributions, which affects the functioning of ecosystems (11-15). For example, in eastern Amazonia, water uptake from 2-to 8-m soil depths contributes to more than threefourths of the transpiration of evergreen forest in the dry season and helps maintain an evergreen canopy on Ͼ1 million km 2 of tropical forest (1, 16). Characteristics of roots and the soil are also needed in models of biosphere-atmosphere interactions (17). A comparison of 14 land surface parameterizations concluded that rooting depth and vertical soil characteristics were the most important factors explaining scatter among models for simulated transpiration (18, 19), determining the amount of water available to plants and partitioning its uptake from different layers. Conclusions were similar for global soil-moisture dynamics (20,21). Global simulations of net primary productivity and transpiration increased 16% and 18%, respectively, when optimized rooting depths incorporated soil water deeper than 1 m (22).We developed a method for identifying roots based on DNA sequence variation and applied this method to roots collected from caves 5-65 m deep to determine the belowground community structure and maximum rooting depth of the 100,000-km 2 Edwards Plateau of central Texas. The Edwards Plateau and other karst regions in Texas cover one-fifth of the state, with Ͼ3,000 caves identified to date (23, 24). Karst systems, in general, cover 7-10% of land surface area globally and supply a quarter of the earth's population ...
Phylogenetic networks model the evolutionary history of sets of organisms when events such as hybrid speciation and horizontal gene transfer occur. In spite of their widely acknowledged importance in evolutionary biology, phylogenetic networks have so far been studied mostly for specific data sets. We present a general definition of phylogenetic networks in terms of directed acyclic graphs (DAGs) and a set of conditions. Further, we distinguish between model networks and reconstructible ones and characterize the effect of extinction and taxon sampling on the reconstructibility of the network. Simulation studies are a standard technique for assessing the performance of phylogenetic methods. A main step in such studies entails quantifying the topological error between the model and inferred phylogenies. While many measures of tree topological accuracy have been proposed, none exist for phylogenetic networks. Previously, we proposed the first such measure, which applied only to a restricted class of networks. In this paper, we extend that measure to apply to all networks, and prove that it is a metric on the space of phylogenetic networks. Our results allow for the systematic study of existing network methods, and for the design of new accurate ones.
The catalytic subunit of cellulose synthase is shown to be associated with the putative cellulose-synthesizing complex (rosette terminal complex [TC]) in vascular plants. The catalytic subunit domain of cotton cellulose synthase was cloned using a primer based on a rice expressed sequence tag (D41261) from which a specific primer was constructed to run a polymerase chain reaction that used a cDNA library from 24 days postanthesis cotton fibers as a template. The catalytic region of cotton cellulose synthase was expressed in Escherichia coli, and polyclonal antisera were produced. Colloidal gold coupled to goat anti-rabbit secondary antibodies provided a tag for visualization of the catalytic region of cellulose synthase during transmission electron microscopy. With a freeze-fracture replica labeling technique, the antibodies specifically localized to rosette TCs in the plasma membrane on the P-fracture face. Antibodies did not specifically label any structures on the E-fracture face. Significantly, a greater number of immune probes labeled the rosette TCs (i.e., gold particles were 20 nm or closer to the edge of the rosette TC) than did preimmune probes. These experiments confirm the long-held hypothesis that cellulose synthase is a component of the rosette TC in vascular plants, proving that the enzyme complex resides within the structure first described by freeze fracture in 1980. In addition, this study provides independent proof that the CelA gene is in fact one of the genes for cellulose synthase in vascular plants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.