N-terminal methionine excision (NME) and N-terminal acetylation (NTA) are two of the most common protein post-translational modifications. NME is a universally conserved activity and a highly specific mechanism across all life forms. NTA is very common in eukaryotes but occurs rarely in prokaryotes. By analyzing data sets from yeast, mammals and bacteria (including 112 million spectra from 57 bacterial species), the largest comparative proteogenomics study to date, it is shown that previous assumptions/perceptions about the specificity and purposes of NME are not entirely correct. Although NME, through the universal enzymatic specificity of the methionine aminopeptidases, results in the removal of the initiator Met in proteins when the second residue is Gly, Ala, Ser, Cys, Thr, Pro, or Val, the comparative genomic analyses suggest that this specificity may vary modestly in some organisms. In addition, the functional role of NME may be primarily to expose Ala and Ser rather than all seven of these residues. Although any of this group provide "stabilizing'' N termini in the N-end rule, and de facto leave the remaining 13 amino acid types that are classed as "destabilizing'' (in higher eukaryotes) protected by the initiator Met, the conservation of NME-substrate proteins through evolution suggests that the other five are not crucially important for proteins with these residues in the second position. They are apparently merely inconsequential players (their function is not affected by NME) that become exposed because their side chains are smaller or comparable to those of Ala and Ser. The importance of exposing mainly two amino acids at the N terminus, i.e. Ala and Ser, is unclear but may be related to NTA or other post-translational modifications. In this regard, these analyses also reveal that NTA is more prevalent in some prokaryotes than previously appreciated. Molecular & Cellular Proteomics
The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires.Availability and implementation: IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools.Contact: ppevzner@ucsd.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Aiming toward an improved understanding of the regulation of proteins in cancer, recent studies from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) have focused on analyzing cancer tissue using proteomic technologies and workflows. Although many proteogenomics approaches for the study of cancer samples have been proposed, serious methodological challenges remain, especially in the identification of multiple mutational variants or structural variations such as fusion gene events. In addition, although immune system genes play an important role in cancer, identification of IgG peptides remains challenging in proteomic data sets. Here, we describe an integrative proteogenomic method that extends the limit of proteogenomic searches to identify multiple variant peptides as well as immunoglobulin gene variations/rearrangements using customized mining of RNA-seq data. Our results also provide the first extensive characterization of tumor immune response and demonstrate the potential of this method to improve the molecular characterization of tumor subtypes.
Summary Over the last five years proteogenomics (using mass spectroscopy to identify proteins predicted from genomic sequences) has emerged as a promising approach to the high-throughput identification of protein N-termini, which remains a problem in genome annotation. Comparison of the experimentally determined N-termini with those predicted by sequence analysis tools allows identification of the signal peptides and therefore conclusions on the cytoplasmic or extracytoplasmic (periplasmic or extracellular) localization of the respective proteins. We present here the results of a proteogenomic study of the signal peptides in Escherichia coli K-12 and compare its results with the available experimental data and predictions by such software tools as SignalP and Phobius. A single proteogenomics experiment recovered more than a third of all signal peptides that had been experimentally determined during the past three decades and confirmed at least 31additional signal peptides, mostlyin the known exported proteins, which had been previously predicted but not validated. The filtering of putative signal peptides for the peptide length and the presence of an eight-residue hydrophobic patch and a typical signal peptidase cleavage site proved sufficient to eliminate the false-positive hits. Surprisingly, the results of this proteogenomics study, as well as a re-analysis of the E. coli genome with the latest version of SignalP program, show that the fraction of proteins containing signal peptides is only about 10%, or half of previous estimates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.