In the last two years, because of advances in protein separation and mass spectrometry, top-down mass spectrometry moved from analyzing single proteins to analyzing complex samples and identifying hundreds and even thousands of proteins. However, computational tools for database search of top-down spectra against protein databases are still in their infancy. We describe MS-Align؉, a fast algorithm for top-down protein identification based on spectral alignment that enables searches for unexpected post-translational modifications. We also propose a method for evaluating statistical significance of topdown protein identifications and further benchmark various software tools on two top-down data sets from Saccharomyces cerevisiae and Salmonella typhimurium. We demonstrate that MS-Align؉ significantly increases the number of identified spectra as compared with MASCOT and OMSSA on both data sets. Although MS-Align؉ and ProSightPC have similar performance on the Salmonella typhimurium data set, MS-Align؉ outperforms ProSightPC on the (more complex) Saccharomyces cerevisiae data set. Molecular & Cellular Proteomics 11: 10.1074/mcp.M111.008524, 1-13, 2012.In the past two decades, proteomics was dominated by bottom-up mass spectrometry that analyzes digested peptides rather than intact proteins. Bottom-up approaches, although powerful, do have limitations in analyzing protein species, e.g. various proteolytic forms of the same protein or various protein isoforms resulting from alternative splicing. Top-down mass spectrometry focuses on analyzing intact proteins and large peptides (1-10) and has advantages in localizing multiple post-translational modifications (PTMs) 1 in a coordinated fashion (e.g. combinatorial PTM code) and identifying multiple protein species (e.g. proteolytically processed protein species) (11). Until recently, most top-down studies were limited to single purified proteins (12-15). Topdown studies of protein mixtures were restricted by difficulties in separating and fragmenting intact proteins and a shortage of robust computational tools. In the last two years, because of advances in protein separation and top-down instrumentation, top-down mass spectrometry moved from analyzing single proteins to analyzing complex samples containing hundreds and even thousands of proteins (16 -21). Because algorithms for interpreting topdown spectra are still in their infancy, many recent developments include computational innovations in protein identification.
Diatoms play a critical role in the oceans' carbon and silicon cycles; however, a mechanistic understanding of the biochemical processes that contribute to their ecological success remains elusive. Completion of the Thalassiosira pseudonana genome provided 'blueprints' for the potential biochemical machinery of diatoms, but offers only a limited insight into their biology under various environmental conditions. Using high-throughput shotgun proteomics, we identified a total of 1928 proteins expressed by T. pseudonana cultured under optimal growth conditions, enabling us to analyze this diatom's primary metabolic and biosynthetic pathways. Of the proteins identified, 70% are involved in cellular metabolism, while 11% are involved in the transport of molecules. We identified all of the enzymes involved in the urea cycle, thereby presenting a complete pathway to convert ammonia to urea, along with urea transporters, and the urea-degrading enzyme urease. Although metabolic exchange between these pathways remains ambiguous, their constitutive presence suggests complex intracellular nitrogen recycling. In addition, all C 4 -related enzymes for carbon fixation have been identified to be in abundance, with high protein sequence coverage. Quantification of mass spectra acquisitions demonstrated that the 20 most abundant proteins included an unexpectedly high expression of clathrin, which is the primary structural protein involved in endocytic transport. This result highlights a previously overlooked mechanism for the inter-and intra-cellular transport of nutrients and macromolecules in diatoms, potentially providing a missing link to organelle communication and metabolite exchange. Our results demonstrate the power of proteomics, and lay the groundwork for future comparative proteomic studies and directed analyses of specifically expressed proteins and biochemical pathways of oceanic diatoms.
SOX2 is a key gene implicated in maintaining the stemness of embryonic and adult stem cells that appears to re-activate in several human cancers including glioblastoma multiforme. Using immunoprecipitation (IP)/MS/MS, we identified 144 proteins that are putative SOX2 interacting proteins. Of note, SOX2 was found to interact with several heterogeneous nuclear ribonucleoprotein family proteins, including HNRNPA2B1, HNRNPA3, HNRNPC, HNRNPK, HNRNPL, HNRNPM, HNRNPR, HNRNPU, as well as other ribonucleoproteins, DNA repair proteins and helicases. Gene ontology (GO) analysis revealed that the SOX2 interactome was enriched for GO terms GO:0030529 ribonucleoprotein complex and GO:0004386 helicase activity. These findings indicate that SOX2 associates with the heterogeneous nuclear ribonucleoprotein complex, suggesting a possible role for SOX2 in post-transcriptional regulation in addition to its function as a transcription factor.
We present a precursor ion independent top-down algorithm (PIITA) for use in automated assignment of protein identifications from tandem mass spectra of whole proteins. To acquire the data, we utilize data-dependent acquisition to select protein precursor ions eluting from a C4-based HPLC column for collision induced dissociation in the linear ion trap of an LTQ-Orbitrap mass spectrometer. Gas-phase fractionation is used to increase the number of acquired tandem mass spectra, all of which are recorded in the Orbitrap mass analyzer. To identify proteins, the PIITA algorithm compares deconvoluted, deisotoped, observed tandem mass spectra to all possible theoretical tandem mass spectra for each protein in a genomic sequence database without regard for measured parent ion mass. Only after a protein is identified, is any difference in measured and theoretical precursor mass used to identify and locate post-translation modifications. We demonstrate the application of PIITA to data generated via our wet-lab approach on a Salmonella typhimurium outer membrane extract and compare these results to bottom-up analysis. From these data, we identify 154 proteins by top-down analysis, 73 of which were not identified in a parallel bottom-up analysis. We also identify 201 unique isoforms of these 154 proteins at a false discovery rate (FDR) of Ͻ1%. (J Am Soc Mass
Although mass spectrometers are capable of providing high mass accuracy data, assignment of true monoisotopic precursor ion mass is complicated during data-dependent ion selection for LC-MS/MS analysis of complex mixtures. The complication arises when chromatographic peak widths for a given analyte exceed the time required to acquire a precursor ion mass spectrum. The result is that many measured monoisotopic masses are misassigned due to calculation from a single mass spectrum with poor ion statistics based on only a fraction of the total available ions for a given analyte. Such data in turn produces errors in automated database searches, where precursor m/z value is one search parameter. We propose here a postacquisition approach to correct misassigned monoisotopic m/z values that involves peak detection over the entire elution profile and correction of the precursor ion monoisotopic mass. As a result of using this approach to reprocess shotgun proteomic data we increased peptide sequence assignments by 10% while reducing the estimated false positive ratio from 1 to 0.2%. We also show that 4% of the salvaged identifications may be accounted for by correction of mixed tandem mass spectra resulting from fragmentation of multiple peptides simultaneously, a situation which we refer to as accidental CID.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.