Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire proteincoding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.
The plant kingdom represents a prominent biodiversity island for microbes that associate with the below- or aboveground organs of vegetal species. Both the root and the leaf represent interfaces where dynamic biological interactions influence plant life. Beside well-studied communication strategies based on soluble compounds and protein effectors, bacteria were recently shown to interact both with host plants and other microbial species through the emissions of volatile organic compounds (VOCs). Focusing on the potato late blight-causing agent Phytophthora infestans, this work addresses the potential role of the bacterial volatilome in suppressing plant diseases. In a previous study, we isolated and identified a large collection of strains with anti-Phytophthora potential from both the phyllosphere and the rhizosphere of potato. Here we report the characterization and quantification of their emissions of biogenic volatiles, comparing 16 Pseudomonas strains differing in (i) origin of isolation (phyllosphere vs. rhizosphere), (ii) in vitro inhibition of P. infestans growth and sporulation behavior, and (iii) protective effects against late blight on potato leaf disks. We systematically tested the pharmacological inhibitory activity of core and strain-specific single compounds against P. infestans mycelial growth and sporangial behavior in order to identify key effective candidate molecules present in the complex natural VOCs blends. We envisage the plant bacterial microbiome as a reservoir for functional VOCs and establish the basis for finding the primary enzymatic toolset that enables the production of active components of the volatile bouquet in plant-associated bacteria. Comprehension of these functional interspecies interactions will open perspectives for the sustainable control of plant diseases in forthcoming agriculture.
Small open reading frame encoded proteins (SEPs) gained increasing interest during the last few years because of their broad range of important functions in both prokaryotes and eukaryotes. In bacteria, signaling, virulence, and regulation of enzyme activities have been associated with SEPs. Nonetheless, the number of SEPs detected in large-scale proteome studies is often low as classical methods are biased toward the identification of larger proteins. Here, we present a workflow that allows enhanced identification of small proteins compared to traditional protocols. For this aim, the steps of small protein enrichment, proteolytic digest, and database search were reviewed and adjusted to the special requirement of SEPs. Enrichment by the use of small-pore-sized solid-phase material increased the number of identified SEPs by a factor of 2, and utilization of alternative proteases to trypsin reduced the spectral counts for larger proteins. The application of the optimized protocol allowed the detection of 210 already annotated proteins up to 100 amino acids (aa) length, including 16 proteins below 51 aa in the Gram-positive model organism Bacillus subtilis. Moreover, 12% of all identified proteins were up to 100 aa, which is a significantly larger fraction than that reported in studies involving traditional proteomics workflows. Finally, the application of an integrated proteogenomics search database and extensive subsequent validation resulted in the confident identification of three novel, not yet annotated, SEPs, which are 21, 26, and 42 aa long.
Bacterial ribosome-dependent attenuators are widespread posttranscriptional regulators. They harbor small upstream open reading frames (uORFs) encoding leader peptides, for which no functions in trans are known yet. In the plant symbiont Sinorhizobium meliloti, the tryptophan biosynthesis gene trpE(G) is preceded by the uORF trpL and is regulated by transcription attenuation according to tryptophan availability. However, trpLE(G) transcription is initiated independently of the tryptophan level in S. meliloti, thereby ensuring a largely tryptophan-independent production of the leader peptide peTrpL. Here, we provide evidence for a tryptophan-independent role of peTrpL in trans. We found that peTrpL increases the resistance toward tetracycline, erythromycin, chloramphenicol, and the flavonoid genistein, which are substrates of the major multidrug efflux pump SmeAB. Coimmunoprecipitation with a FLAG-peTrpL suggested smeR mRNA, which encodes the transcription repressor of smeABR, as a peptide target. Indeed, upon antibiotic exposure, smeR mRNA was destabilized and smeA stabilized in a peTrpL-dependent manner, showing that peTrpL acts in the differential regulation of smeABR. Furthermore, smeR mRNA was coimmunoprecipitated with peTrpL in antibiotic-dependent ribonucleoprotein (ARNP) complexes, which, in addition, contained an antibiotic-induced antisense RNA complementary to smeR. In vitro ARNP reconstitution revealed that the above-mentioned antibiotics and genistein directly support complex formation. A specific region of the antisense RNA was identified as a seed region for ARNP assembly in vitro. Altogether, our data show that peTrpL is involved in a mechanism for direct utilization of antimicrobial compounds in posttranscriptional regulation of multiresistance genes. Importantly, this role of peTrpL in resistance is conserved in other Alphaproteobacteria. IMPORTANCE Leader peptides encoded by transcription attenuators are widespread small proteins that are considered nonfunctional in trans. We found that the leader peptide peTrpL of the soil-dwelling plant symbiont Sinorhizobium meliloti is required for differential, posttranscriptional regulation of a multidrug resistance operon upon antibiotic exposure. Multiresistance achieved by efflux of different antimicrobial compounds ensures survival and competitiveness in nature and is important from both evolutionary and medical points of view. We show that the leader peptide forms antibiotic- and flavonoid-dependent ribonucleoprotein complexes (ARNPs) for destabilization of smeR mRNA encoding the transcription repressor of the major multidrug resistance operon. The seed region for ARNP assembly was localized in an antisense RNA, whose transcription is induced by antimicrobial compounds. The discovery of ARNP complexes as new players in multiresistance regulation opens new perspectives in understanding bacterial physiology and evolution and potentially provides new targets for antibacterial control.
Although complete genome sequences hold particular value for an accurate description of core genomes, the identification of strain-specific genes, and as the optimal basis for functional genomics studies, they are still largely underrepresented in public repositories. Based on an assessment of the genome assembly complexity for all lactobacilli, we used Pacific Biosciences' long read technology to sequence and de novo assemble the genomes of three Lactobacillus helveticus starter strains, raising the number of completely sequenced strains to 12. The first comparative genomics study for L. helveticus—to our knowledge—identified a core genome of 988 genes and sets of unique, strain-specific genes ranging from about 30 to more than 200 genes. Importantly, the comparison of MiSeq- and PacBio-based assemblies uncovered that not only accessory but also core genes can be missed in incomplete genome assemblies based on short reads. Analysis of the three genomes revealed that a large number of pseudogenes were enriched for functional Gene Ontology categories such as amino acid transmembrane transport and carbohydrate metabolism, which is in line with a reductive genome evolution in the rich natural habitat of L. helveticus. Notably, the functional Clusters of Orthologous Groups of proteins categories “cell wall/membrane biogenesis” and “defense mechanisms” were found to be enriched among the strain-specific genes. A genome mining effort uncovered examples where an experimentally observed phenotype could be linked to the underlying genotype, such as for cell envelope proteinase PrtH3 of strain FAM8627. Another possible link identified for peptidoglycan hydrolases will require further experiments. Of note, strain FAM22155 did not harbor a CRISPR/Cas system; its loss was also observed in other L. helveticus strains and lactobacillus species, thus questioning the value of the CRISPR/Cas system for diagnostic purposes. Importantly, the complete genome sequences proved to be very useful for the analysis of natural whey starter cultures with metagenomics, as a larger percentage of the sequenced reads of these complex mixtures could be unambiguously assigned down to the strain level.
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes.However, large discrepancies among the number of CDSs annotated by different resources, 5missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations.Our strategy towards accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-10 coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics dataset against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including 15 lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and variants identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content 20 and distinct taxonomic origin, and release iPtgxDBs for B. henselae, Bradyrhozibium diazoefficiens and Escherichia coli as well as the software to generate such proteogenomics search databases for any prokaryote.
Pseudomonas aeruginosa MPAO1 is the parental strain of the widely utilized transposon mutant collection for this important clinical pathogen. Here, we validate a model system to identify genes involved in biofilm growth and biofilm-associated antibiotic resistance. Our model employs a genomics-driven workflow to assemble the complete MPAO1 genome, identify unique and conserved genes by comparative genomics with the PAO1 reference strain and genes missed within existing assemblies by proteogenomics. Among over 200 unique MPAO1 genes, we identified six general essential genes that were overlooked when mapping public Tn-seq data sets against PAO1, including an antitoxin. Genomic data were integrated with phenotypic data from an experimental workflow using a user-friendly, soft lithography-based microfluidic flow chamber for biofilm growth and a screen with the Tn-mutant library in microtiter plates. The screen identified hitherto unknown genes involved in biofilm growth and antibiotic resistance. Experiments conducted with the flow chamber across three laboratories delivered reproducible data on P. aeruginosa biofilms and validated the function of both known genes and genes identified in the Tn-mutant screens. Differential protein abundance data from planktonic cells versus biofilm confirmed the upregulation of candidates known to affect biofilm formation, of structural and secreted proteins of type VI secretion systems, and provided proteogenomic evidence for some missed MPAO1 genes. This integrated, broadly applicable model promises to improve the mechanistic understanding of biofilm formation, antimicrobial tolerance, and resistance evolution in biofilms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.