Bacteria with two cell membranes (diderms) have evolved complex systems for protein secretion. These systems were extensively studied in some model bacteria, but the characterisation of their diversity has lagged behind due to lack of standard annotation tools. We built online and standalone computational tools to accurately predict protein secretion systems and related appendages in bacteria with LPS-containing outer membranes. They consist of models describing the systems’ components and genetic organization to be used with MacSyFinder to search for T1SS-T6SS, T9SS, flagella, Type IV pili and Tad pili. We identified ~10,000 candidate systems in bacterial genomes, where T1SS and T5SS were by far the most abundant and widespread. All these data are made available in a public database. The recently described T6SSiii and T9SS were restricted to Bacteroidetes, and T6SSii to Francisella. The T2SS, T3SS, and T4SS were frequently encoded in single-copy in one locus, whereas most T1SS were encoded in two loci. The secretion systems of diderm Firmicutes were similar to those found in other diderms. Novel systems may remain to be discovered, since some clades of environmental bacteria lacked all known protein secretion systems. Our models can be fully customized, which should facilitate the identification of novel systems.
Type 3 secretion systems (T3SSs) are essential components of two complex bacterial machineries: the flagellum, which drives cell motility, and the non-flagellar T3SS (NF-T3SS), which delivers effectors into eukaryotic cells. Yet the origin, specialization, and diversification of these machineries remained unclear. We developed computational tools to identify homologous components of the two systems and to discriminate between them. Our analysis of >1,000 genomes identified 921 T3SSs, including 222 NF-T3SSs. Phylogenomic and comparative analyses of these systems argue that the NF-T3SS arose from an exaptation of the flagellum, i.e. the recruitment of part of the flagellum structure for the evolution of the new protein delivery function. This reconstructed chronology of the exaptation process proceeded in at least two steps. An intermediate ancestral form of NF-T3SS, whose descendants still exist in Myxococcales, lacked elements that are essential for motility and included a subset of NF-T3SS features. We argue that this ancestral version was involved in protein translocation. A second major step in the evolution of NF-T3SSs occurred via recruitment of secretins to the NF-T3SS, an event that occurred at least three times from different systems. In rhizobiales, a partial homologous gene replacement of the secretin resulted in two genes of complementary function. Acquisition of a secretin was followed by the rapid adaptation of the resulting NF-T3SSs to multiple, distinct eukaryotic cell envelopes where they became key in parasitic and mutualistic associations between prokaryotes and eukaryotes. Our work elucidates major steps of the evolutionary scenario leading to extant NF-T3SSs. It demonstrates how molecular evolution can convert one complex molecular machine into a second, equally complex machine by successive deletions, innovations, and recruitment from other molecular systems.
MotivationBiologists often wish to use their knowledge on a few experimental models of a given molecular system to identify homologs in genomic data. We developed a generic tool for this purpose.Results Macromolecular System Finder (MacSyFinder) provides a flexible framework to model the properties of molecular systems (cellular machinery or pathway) including their components, evolutionary associations with other systems and genetic architecture. Modelled features also include functional analogs, and the multiple uses of a same component by different systems. Models are used to search for molecular systems in complete genomes or in unstructured data like metagenomes. The components of the systems are searched by sequence similarity using Hidden Markov model (HMM) protein profiles. The assignment of hits to a given system is decided based on compliance with the content and organization of the system model. A graphical interface, MacSyView, facilitates the analysis of the results by showing overviews of component content and genomic context. To exemplify the use of MacSyFinder we built models to detect and class CRISPR-Cas systems following a previously established classification. We show that MacSyFinder allows to easily define an accurate “Cas-finder” using publicly available protein profiles.Availability and ImplementationMacSyFinder is a standalone application implemented in Python. It requires Python 2.7, Hmmer and makeblastdb (version 2.2.28 or higher). It is freely available with its source code under a GPLv3 license at https://github.com/gem-pasteur/macsyfinder. It is compatible with all platforms supporting Python and Hmmer/makeblastdb. The “Cas-finder” (models and HMM profiles) is distributed as a compressed tarball archive as Supporting Information.
Processes of molecular innovation require tinkering and shifting in the function of existing genes. How this occurs in terms of molecular evolution at long evolutionary scales remains poorly understood. Here, we analyse the natural history of a vast group of membrane-associated molecular systems in Bacteria and Archaea—the type IV filament (TFF) superfamily—that diversified in systems involved in flagellar or twitching motility, adhesion, protein secretion, and DNA uptake. The phylogeny of the thousands of detected systems suggests they may have been present in the last universal common ancestor. From there, two lineages—a bacterial and an archaeal—diversified by multiple gene duplications, gene fissions and deletions, and accretion of novel components. Surprisingly, we find that the ‘tight adherence’ (Tad) systems originated from the interkingdom transfer from Archaea to Bacteria of a system resembling the ‘EppA-dependent’ (Epd) pilus and were associated with the acquisition of a secretin. The phylogeny and content of ancestral systems suggest that initial bacterial pili were engaged in cell motility and/or DNA uptake. In contrast, specialised protein secretion systems arose several times independently and much later in natural history. The functional diversification of the TFF superfamily was accompanied by genetic rearrangements with implications for genetic regulation and horizontal gene transfer: systems encoded in fewer loci were more frequently exchanged between taxa. This may have contributed to their rapid evolution and spread across Bacteria and Archaea. Hence, the evolutionary history of the superfamily reveals an impressive catalogue of molecular evolution mechanisms that resulted in remarkable functional innovation and specialisation from a relatively small set of components.
Conjugation of DNA through a type IV secretion system (T4SS) drives horizontal gene transfer. Yet little is known on the diversity of these nanomachines. We previously found that T4SS can be divided in eight classes based on the phylogeny of the only ubiquitous protein of T4SS (VirB4). Here, we use an ab initio approach to identify protein families systematically and specifically associated with VirB4 in each class. We built profiles for these proteins and used them to scan 2262 genomes for the presence of T4SS. Our analysis led to the identification of thousands of occurrences of 116 protein families for a total of 1623 T4SS. Importantly, we could identify almost always in our profiles the essential genes of well-studied T4SS. This allowed us to build a database with the largest number of T4SS described to date. Using profile–profile alignments, we reveal many new cases of homology between components of distant classes of T4SS. We mapped these similarities on the T4SS phylogenetic tree and thus obtained the patterns of acquisition and loss of these protein families in the history of T4SS. The identification of the key VirB4-associated proteins paves the way toward experimental analysis of poorly characterized T4SS classes.
Ammonia-oxidizing archaea (AOA) are among the most abundant microorganisms and key players in the global nitrogen and carbon cycles. They share a common energy metabolism but represent a heterogeneous group with respect to their environmental distribution and adaptions, growth requirements, and genome contents. We report here the genome and proteome of Nitrososphaera viennensis EN76, the type species of the archaeal class Nitrososphaeria of the phylum Thaumarchaeota encompassing all known AOA. N. viennensis is a soil organism with a 2.52-Mb genome and 3,123 predicted proteincoding genes. Proteomic analysis revealed that nearly 50% of the predicted genes were translated under standard laboratory growth conditions. Comparison with genomes of closely related species of the predominantly terrestrial Nitrososphaerales as well as the more streamlined marine Nitrosopumilales [Candidatus (Ca.) order] and the acidophile "Ca. Nitrosotalea devanaterra" revealed a core genome of AOA comprising 860 genes, which allowed for the reconstruction of central metabolic pathways common to all known AOA and expressed in the N. viennensis and "Ca. Nitrosopelagicus brevis" proteomes. Concomitantly, we were able to identify candidate proteins for as yet unidentified crucial steps in central metabolisms. In addition to unraveling aspects of core AOA metabolism, we identified specific metabolic innovations associated with the Nitrososphaerales mediating growth and survival in the soil milieu, including the capacity for biofilm formation, cell surface modifications and cell adhesion, and carbohydrate conversions as well as detoxification of aromatic compounds and drugs. ammonia oxidation | proteomics | archaea | comparative genomics | biofilm
The timing of the evolution of microbial life has largely remained elusive due to the scarcity of prokaryotic fossil record and the confounding effects of the exchange of genes among possibly distant species. The history of gene transfer events, however, is not a series of individual oddities; it records which lineages were concurrent and thus provides information on the timing of species diversification. Here, we use a probabilistic model of genome evolution that accounts for differences between gene phylogenies and the species tree as series of duplication, transfer, and loss events to reconstruct chronologically ordered species phylogenies. Using simulations we show that we can robustly recover accurate chronologically ordered species phylogenies in the presence of gene tree reconstruction errors and realistic rates of duplication, transfer, and loss. Using genomic data we demonstrate that we can infer rooted species phylogenies using homologous gene families from complete genomes of 10 bacterial and archaeal groups. Focusing on cyanobacteria, distinguished among prokaryotes by a relative abundance of fossils, we infer the maximum likelihood chronologically ordered species phylogeny based on 36 genomes with 8,332 homologous gene families. We find the order of speciation events to be in full agreement with the fossil record and the inferred phylogeny of cyanobacteria to be consistent with the phylogeny recovered from established phylogenomics methods. Our results demonstrate that lateral gene transfers, detected by probabilistic models of genome evolution, can be used as a source of information on the timing of evolution, providing a valuable complement to the limited prokaryotic fossil record. molecular dating | gene tree reconciliation | birth-death model A central aspect of Earth's history is the pattern and timing of diversification of the species that inhabit it. In macroorganisms such as animals or plants, an abundant fossil record, the accumulation of genomic data, and the development of models of molecular evolution accommodating for varying rates of evolutionary changes among lineages are progressively yielding an intelligible picture (1-4). In contrast, the dating of the evolution of microbial life remains largely elusive (5, 6). This situation results from the convergence of two main factors: first, fossils, especially bacterial and archaeal ones, are scarce or cannot be traced to a specific lineage. Therefore, any inference of the timing of microbial evolution must rely almost exclusively on molecular data constrained only by a handful of dates during the course of more than three billion years of evolution. Second, molecular data can be difficult to interpret in terms of patterns of species diversification. Lateral gene transfers (LGTs), the exchange of genes among possibly distant species, have tangled gene phylogenies to the extent that they provide a deeply blurred view of the relationships between lineages. Different approaches [e.g., concatenation, supertrees (7, 8)] have been proposed to ove...
Lateral gene transfer (LGT), the acquisition of genes from other species, is a major evolutionary force. However, its success as an adaptive process makes the reconstruction of the history of life an intricate puzzle: If no gene has remained unaffected during the course of life's evolution, how can one rely on molecular markers to reconstruct the relationships among species? Here, we take a completely different look at LGT and its impact for the reconstruction of the history of life. Rather than trying to remove the effect of LGT in phylogenies, and ignoring as a result most of the information of gene histories, we use an explicit phylogenetic model of gene transfer to reconcile gene histories with the tree of species. We studied 16 bacterial and archaeal phyla, representing a dataset of 12,000 gene families distributed in 336 genomes. Our results show that, in most phyla, LGT provides an abundant phylogenetic signal on the pattern of species diversification and that this signal is robust to the choice of gene families under study. We also find that LGT brings an abundant signal on the location of the root of species trees, which has been previously overlooked. Our results quantify the great variety of gene transfer rates among lineages of the tree of life and provide strong support for the "complexity hypothesis," which states that genes whose products participate to macromolecular protein complexes are relatively resistant to transfer. genome evolution | phylogeny | bacteria | archaea
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.