BackgroundOrthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases.ResultsThe program described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes.Conclusions significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.
Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genomewide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer.orthology | paralogy | gene tree | species tree | cograph M olecular phylogenetics is primarily concerned with the reconstruction of evolutionary relationships between species based on sequence information. To this end, alignments of protein or DNA sequences are used, whose evolutionary history is believed to be congruent to that of the respective species. This property can be ensured most easily in the absence of gene duplications and horizontal gene transfer (HGT). Phylogenetic studies judiciously select families of genes that rarely exhibit duplications (such as rRNAs, most ribosomal proteins, and many of the housekeeping enzymes). In phylogenomics, elaborate automatic pipelines such as HaMStR (1), are used to filter genomewide data sets to at least deplete sequences with detectable paralogs (homologs in the same species).In the presence of gene duplications, however, it becomes necessary to distinguish between the evolutionary history of genes (gene trees) and the evolutionary history of the species (species trees) in which these genes reside. Leaves of a gene tree represent genes. Their inner nodes represent two kinds of evolutionary events, namely the duplication of genes within a genome-giving rise to paralogs-and speciations, in which the ancestral gene complement is transmitted to two daughter lineages. Two genes are (co)orthologous if their last common ancestor in the gene tree represents a speciation event, whereas they are paralogous if their last common ancestor is a duplication event; see refs. 2 and 3 for a more recent discussion on orthology and paralogy relationships. Speciation events, in turn, define the inner vertices of a species tree. However, they depend on both the gene and the species phylogeny, as well as the reconciliation between the two. The latter identifies speciation vertices in the gene tree with a particular speciation event in the species tree and places the gene duplication events on the edges of the species tree. Intriguingly, it is nevertheless possible in practice to distinguis...
RNase P is the endonuclease that removes 5' leader sequences from tRNA precursors. In Eukarya, separate RNase P activities exist in the nucleus and mitochondria/plastids. Although all RNase P enzymes catalyze the same reaction, the different architectures found in Eukarya range from ribonucleoprotein (RNP) enzymes with a catalytic RNA and up to 10 protein subunits to single-subunit protein-only RNase P (PRORP) enzymes. Here, analysis of the phylogenetic distribution of RNP and PRORP enzymes in Eukarya revealed 1) a wealth of novel P RNAs in previously unexplored phylogenetic branches and 2) that PRORP enzymes are more widespread than previously appreciated, found in four of the five eukaryal supergroups, in the nuclei and/or organelles. Intriguingly, the occurrence of RNP RNase P and PRORP seems mutually exclusive in genetic compartments of modern Eukarya. Our comparative analysis provides a global picture of the evolution and diversification of RNase P throughout Eukarya.
The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present , an extension for the standalone tool , which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, , a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis , making the software applicable to very large datasets.
RNase P is an essential tRNA-processing enzyme in all domains of life. We identified an unknown type of protein-only RNase P in the hyperthermophilic bacterium Aquifex aeolicus: Without an RNA subunit and the smallest of its kind, the 23-kDa polypeptide comprises a metallonuclease domain only. The protein has RNase P activity in vitro and rescued the growth of Escherichia coli and Saccharomyces cerevisiae strains with inactivations of their more complex and larger endogenous ribonucleoprotein RNase P. Homologs of Aquifex RNase P (HARP) were identified in many Archaea and some Bacteria, of which all Archaea and most Bacteria also encode an RNA-based RNase P; activity of both RNase P forms from the same bacterium or archaeon could be verified in two selected cases. Bioinformatic analyses suggest that A. aeolicus and related Aquificaceae likely acquired HARP by horizontal gene transfer from an archaeon.protein-only RNase P | Aquifex aeolicus | tRNA processing | HARP T he architectural diversity of RNase P enzymes is unique: In Bacteria, Archaea, and in the nuclei and organelles of many Eukarya, RNase P is a complex consisting of a catalytic RNA subunit and a varying number of proteins (one in Bacteria, at least four in Archaea, and up to 10 in Eukarya) (1, 2). A different type of RNase P was discovered more recently in human mitochondria (3) and, subsequently, in land plants and some protists (4, 5). This form, termed proteinaceous or protein-only RNase P (PRORP), lacks any RNA subunit and consists of one or three (animal mitochondria) protein subunit(s); it is found in most branches of the eukaryotic phylogenetic tree (6).Bacterial RNase P enzymes identified so far are composed of a ∼400-nt-long catalytic RNA subunit (encoded by rnpB) and a small protein subunit of ∼14 kDa (encoded by rnpA) (7). However, no rnpA and rnpB genes were identified in the genome of Aquifex aeolicus or other Aquificaceae (8-12). The genetic organization of A. aeolicus tRNAs in tandem clusters and as part of ribosomal operons and the detection of tRNAs with canonical mature 5′-ends in total RNA extracts from A. aeolicus implied the existence of a tRNA 5′-maturation activity (9) that was indeed subsequently detected in cell lysates of A. aeolicus (11, 13). However, to date, the identity and biochemical composition of RNase P in A. aeolicus has remained enigmatic. Results and DiscussionHere, we pursued a classical biochemical approach to identify the RNase P of A. aeolicus. The purification procedure consisted of three consecutive chromatographic steps: anion exchange, hydrophobic interaction, and size exclusion chromatography (AEC, HIC, and SEC, respectively; Fig. 1A and SI Appendix, Figs. S1-S8). RNase P activity was assayed at all purification steps. To identify putative protein components of the enzyme, fractions with low and high RNase P activity from different purification steps were comparatively analyzed by step-gradient SDS/PAGE, and protein bands correlating with activity (Fig. 1B) were subjected to mass spectrometry. An example i...
RNA has been proposed as an important scaffolding factor in the nucleus, aiding protein complex assembly in the dense intracellular milieu. Architectural contributions of RNA to cytosolic signaling pathways, however, remain largely unknown. Here, we devised a multidimensional gradient approach, which systematically locates RNA components within cellular protein networks. Among a subset of noncoding RNAs (ncRNAs) cosedimenting with the ubiquitin–proteasome system, our approach unveiled ncRNA MaIL1 as a critical structural component of the Toll-like receptor 4 (TLR4) immune signal transduction pathway. RNA affinity antisense purification–mass spectrometry (RAP-MS) revealed MaIL1 binding to optineurin (OPTN), a ubiquitin-adapter platforming TBK1 kinase. MaIL1 binding stabilized OPTN, and consequently, loss of MaIL1 blunted OPTN aggregation, TBK1-dependent IRF3 phosphorylation, and type I interferon (IFN) gene transcription downstream of TLR4. MaIL1 expression was elevated in patients with active pulmonary infection and was highly correlated with IFN levels in bronchoalveolar lavage fluid. Our study uncovers MaIL1 as an integral RNA component of the TLR4–TRIF pathway and predicts further RNAs to be required for assembly and progression of cytosolic signaling networks in mammalian cells.
Bacterial 6S RNAs bind to the housekeeping RNA polymerase (σ A -RNAP in Bacillus subtilis) to regulate transcription in a growth phase-dependent manner. B. subtilis expresses two 6S RNAs, 6S-1 and 6S-2 RNA, with different expression profiles. We show in vitro that 6S-2 RNA shares hallmark features with 6S-1 RNA: Both (1) are able to serve as templates for pRNA transcription; (2) bind with comparable affinity to σ A -RNAP; (3) are able to specifically inhibit transcription from DNA promoters, and (4) can form stable 6S RNA:pRNA hybrid structures that (5) abolish binding to σ A -RNAP. However, pRNAs of equal length dissociate faster from 6S-2 than 6S-1 RNA, owing to the higher A,U-content of 6S-2 pRNAs. This could have two mechanistic implications: (1) Short 6S-2 pRNAs (<10 nt) dissociate faster instead of being elongated to longer pRNAs, which could make it more difficult for 6S-2 RNA-stalled RNAP molecules to escape from the sequestration; and (2) relative to 6S-1 RNA, 6S-2 pRNAs of equal length will dissociate more rapidly from 6S-2 RNA after RNAP release, which could affect pRNA turnover or the kinetics of 6S-2 RNA binding to a new RNAP molecule. As 6S-2 pRNAs have not yet been detected in vivo, we considered that cellular RNAP release from 6S-2 RNA might occur via 6S-1 RNA displacing 6S-2 RNA from the enzyme, either in the absence of pRNA transcription or upon synthesis of very short 6S-2 pRNAs (∼5-mers, which would escape detection by deep sequencing). However, binding competition experiments argued against these possibilities.
The study of enterococcal genomes has grown considerably in recent years. While special attention is paid to comparative genomic analysis among clinical relevant isolates, in this study we performed an exhaustive comparative analysis of enterococcal genomes of food origin and/or with potential to be used as probiotics. Beyond common genetic features, we especially aimed to identify those that are specific to enterococcal strains isolated from a certain food-related source as well as features present in a species-specific manner. Thus, the genome sequences of 25 Enterococcus strains, from 7 different species, were examined and compared. Their phylogenetic relationship was reconstructed based on orthologous proteins and whole genomes. Likewise, markers associated with a successful colonization (bacteriocin genes and genomic islands) and genome plasticity (phages and clustered regularly interspaced short palindromic repeats) were investigated for lifestyle specific genetic features. At the same time, a search for antibiotic resistance genes was carried out, since they are of big concern in the food industry. Finally, it was possible to locate 1617 FIGfam families as a core proteome universally present among the genera and to determine that most of the accessory genes code for hypothetical proteins, providing reasonable hints to support their functional characterization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.