Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.
Identifying our most distant animal relatives has emerged as one of the most challenging problems in phylogenetics. This debate has major implications for our understanding of the origin of multicellular animals and of the earliest events in animal evolution, including the origin of the nervous system. Some analyses identify sponges as our most distant animal relatives (Porifera-sister hypothesis), and others identify comb jellies (Ctenophora-sister hypothesis). These analyses vary in many respects, making it difficult to interpret previous tests of these hypotheses. To gain insight into why different studies yield different results, an important next step in the ongoing debate, we systematically test these hypotheses by synthesizing 15 previous phylogenomic studies and performing new standardized analyses under consistent conditions with additional models. We find that Ctenophora-sister is recovered across the full range of examined conditions, and Porifera-sister is recovered in some analyses under narrow conditions when most outgroups are excluded and site-heterogeneous CAT models are used. We additionally find that the number of categories in site-heterogenous models is sufficient to explain the Porifera-sister results. Furthermore, our cross-validation analyses show CAT models that recover Porifera-sister have hundreds of additional categories and fail to fit significantly better than site-heterogeneous models with far fewer categories. Systematic and standardized testing of diverse phylogenetic models suggests that we should be skeptical of Porifera-sister results both because they are recovered under such narrow conditions and because the models in these conditions fit the data no better than other models that recover Ctenophora-sister.
The nature of the visual representation for words has been fiercely debated for over 150 y. We used direct brain stimulation, pre-and postsurgical behavioral measures, and intracranial electroencephalography to provide support for, and elaborate upon, the visual word form hypothesis. This hypothesis states that activity in the left midfusiform gyrus (lmFG) reflects visually organized information about words and word parts. In patients with electrodes placed directly in their lmFG, we found that disrupting lmFG activity through stimulation, and later surgical resection in one of the patients, led to impaired perception of whole words and letters. Furthermore, using machine-learning methods to analyze the electrophysiological data from these electrodes, we found that information contained in early lmFG activity was consistent with an orthographic similarity space. Finally, the lmFG contributed to at least two distinguishable stages of word processing, an early stage that reflects gist-level visual representation sensitive to orthographic statistics, and a later stage that reflects more precise representation sufficient for the individuation of orthographic word forms. These results provide strong support for the visual word form hypothesis and demonstrate that across time the lmFG is involved in multiple stages of orthographic representation. (1), whereas Wernicke firmly rejected that notion, proposing that reading only necessitates representations of visual letters that feed forward into the language system (2). Similarly, the modern debate revolves around whether there is a visual word form system that becomes specialized for the representation of orthographic knowledge (e.g., the visual forms of letter combinations, morphemes, and whole words) (1, 3, 4). One side of the debate is characterized by the view that the brain possesses a visual word form area that is "a major, reproducible site of orthographic knowledge" (5), whereas the other side disavows any need for reading-specific visual specialization, arguing instead for neurons that are "general purpose analyzers of visual forms" (6).The visual word form hypothesis has attracted great scrutiny because the historical novelty of reading makes it highly unlikely that evolution has created a brain system specialized for reading; this places the analysis of visual word forms in stark contrast to other processes that are thought to have specialized neural systems, such as social, verbal language, or emotional processes, which can be seen in our evolutionary ancestors. Thus, testing the word form hypothesis is critical not only for understanding the neural basis of reading, but also for understanding how the brain organizes information that must be learned through extensive experience and for which we have no evolutionary bias.Advances in neuroimaging and lesion mapping have focused the modern debate surrounding the visual word form hypothesis on the left midfusiform gyrus (lmFG). This focus reflects widespread agreement that the lmFG region plays a critical role ...
Motivation Diverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes, and predict gene function. However, automated processing of MSAs and trees remains a challenge due to the lack of a unified toolkit. To fill this gap, we introduce PhyKIT, a toolkit for the UNIX shell environment with 30 functions that process MSAs and trees, including but not limited to estimation of mutation rate, evaluation of sequence composition biases, calculation of the degree of violation of a molecular clock, and collapsing bipartitions (internal branches) with low support. Results To demonstrate the utility of PhyKIT, we detail three use cases: (1) summarizing information content in MSAs and phylogenetic trees for diagnosing potential biases in sequence or tree data; (2) evaluating gene-gene covariation of evolutionary rates to identify functional relationships, including novel ones, among genes; and (3) identify lack of resolution events or polytomies in phylogenetic trees, which are suggestive of rapid radiation events or lack of data. We anticipate PhyKIT will be useful for processing, examining, and deriving biological meaning from increasingly large phylogenomic datasets. Availability PhyKIT is freely available on GitHub (https://github.com/JLSteenwyk/PhyKIT), PyPi (https://pypi.org/project/phykit/), and the Anaconda Cloud (https://anaconda.org/JLSteenwyk/phykit) under the MIT license with extensive documentation and user tutorials (https://jlsteenwyk.com/PhyKIT). Supplementary information Supplementary data are available on figshare (doi: 10.6084/m9.figshare.13118600) and are available at Bioinformatics online.
BackgroundSymbiotic relationships between microbes and their hosts are widespread and diverse, often providing protection or nutrients, and may be either obligate or facultative. However, the genetic mechanisms allowing organisms to maintain host-symbiont associations at the molecular level are still mostly unknown, and in the case of bacterial-animal associations, most genetic studies have focused on adaptations and mechanisms of the bacterial partner. The gutless tubeworms (Siboglinidae, Annelida) are obligate hosts of chemoautotrophic endosymbionts (except for Osedax which houses heterotrophic Oceanospirillales), which rely on the sulfide-oxidizing symbionts for nutrition and growth. Whereas several siboglinid endosymbiont genomes have been characterized, genomes of hosts and their adaptations to this symbiosis remain unexplored.ResultsHere, we present and characterize adaptations of the cold seep-dwelling tubeworm Lamellibrachia luymesi, one of the longest-lived solitary invertebrates. We sequenced the worm’s ~ 688-Mb haploid genome with an overall completeness of ~ 95% and discovered that L. luymesi lacks many genes essential in amino acid biosynthesis, obligating them to products provided by symbionts. Interestingly, the host is known to carry hydrogen sulfide to thiotrophic endosymbionts using hemoglobin. We also found an expansion of hemoglobin B1 genes, many of which possess a free cysteine residue which is hypothesized to function in sulfide binding. Contrary to previous analyses, the sulfide binding mediated by zinc ions is not conserved across tubeworms. Thus, the sulfide-binding mechanisms in sibgolinids need to be further explored, and B1 globins might play a more important role than previously thought. Our comparative analyses also suggest the Toll-like receptor pathway may be essential for tolerance/sensitivity to symbionts and pathogens. Several genes related to the worm’s unique life history which are known to play important roles in apoptosis, cell proliferation, and aging were also identified. Last, molecular clock analyses based on phylogenomic data suggest modern siboglinid diversity originated in 267 mya (± 70 my) support previous hypotheses indicating a Late Mesozoic or Cenozoic origins of approximately 50–126 mya for vestimentiferans.ConclusionsHere, we elucidate several specific adaptations along various molecular pathways that link phenome to genome to improve understanding of holobiont evolution. Our findings of adaptation in genomic mechanisms to reducing environments likely extend to other chemosynthetic symbiotic systems.
Ascomycota, the largest and most well-studied phylum of fungi, contains three subphyla: Saccharomycotina (budding yeasts), Pezizomycotina (filamentous fungi), and Taphrinomycotina (fission yeasts). Despite its importance, we lack a comprehensive genome-scale phylogeny or understanding of the similarities and differences in the mode of genome evolution within this phylum. By examining 1107 genomes from Saccharomycotina (332), Pezizomycotina (761), and Taphrinomycotina (14) species, we inferred a robust genome-wide phylogeny that resolves several contentious relationships and estimated that the Ascomycota last common ancestor likely originated in the Ediacaran period. Comparisons of genomic properties revealed that Saccharomycotina and Pezizomycotina differ greatly in their genome properties and enabled inference of the direction of evolutionary change. The Saccharomycotina typically have smaller genomes, lower guanine-cytosine contents, lower numbers of genes, and higher rates of molecular sequence evolution compared with Pezizomycotina. These results provide a robust evolutionary framework for understanding the diversity and ecological lifestyles of the largest fungal phylum.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.