We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.
A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample. Peptides that correspond to more than a single protein in the sequence database are apportioned among all corresponding proteins, and a minimal protein list sufficient to account for the observed peptide assignments is derived using the expectation-maximization algorithm. Using peptide assignments to spectra generated from a sample of 18 purified proteins, as well as complex H. influenzae and Halobacterium samples, the model is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications. This method allows filtering of large-scale proteomics data sets with predictable sensitivity and false positive identification error rates. Fast, consistent, and transparent, it provides a standard for publishing large-scale protein identification data sets in the literature and for comparing the results obtained from different experiments.
Affinity purification coupled with mass spectrometry (AP-MS) is now a widely used approach for the identification of protein-protein interactions. However, for any given protein of interest, determining which of the identified polypeptides represent bona fide interactors versus those that are background contaminants (e.g. proteins that interact with the solid-phase support, affinity reagent or epitope tag) is a challenging task. While the standard approach is to identify nonspecific interactions using one or more negative controls, most small-scale AP-MS studies do not capture a complete, accurate background protein set. Fortunately, negative controls are largely bait-independent. Hence, aggregating negative controls from multiple AP-MS studies can increase coverage and improve the characterization of background associated with a given experimental protocol. Here we present the Contaminant Repository for Affinity Purification (the CRAPome) and describe the use of this resource to score protein-protein interactions. The repository (currently available for Homo sapiens and Saccharomyces cerevisiae) and computational tools are freely available online at www.crapome.org.
There is a need to better understand and handle the “dark matter” of proteomics – the vast diversity of post-translational and chemical modifications that are unaccounted in a typical analysis and thus remain unidentified. We present a novel fragment-ion indexing method, and its implementation in peptide identification tool MSFragger, that enables an over 100-fold improvement in speed over most existing tools. Using some of the largest proteomic datasets to date, we demonstrate how MSFragger empowers the open database search concept for comprehensive identification of peptides and all their modified forms, uncovering dramatic differences in the modification rates across experimental samples and conditions. We further illustrate its utility using protein-RNA crosslinked peptide data, and using affinity purification experiments where we observe on average a 300% increase in the number of identified spectra for enriched proteins. We also discuss the benefits of open searching for improved false discovery rate estimation in proteomics.
No abstract
The shotgun proteomic strategy based on digesting proteins into peptides and sequencing them using tandem mass spectrometry and automated database searching has become the method of choice for identifying proteins in most large scale studies. However, the peptide-centric nature of shotgun proteomics complicates the analysis and biological interpretation of the data especially in the case of higher eukaryote organisms. The same peptide sequence can be present in multiple different proteins or protein isoforms. Such shared peptides therefore can lead to ambiguities in determining the identities of sample proteins. In this article we illustrate the difficulties of interpreting shotgun proteomic data and discuss the need for common nomenclature and transparent informatic approaches. We also discuss related issues such as the state of protein sequence databases and their role in shotgun proteomic analysis, interpretation of relative peptide quantification data in the presence of multiple protein isoforms, the integration of proteomic and transcriptional data, and the development of a computational infrastructure for the integration of multiple diverse datasets. Molecular & Cellular Proteomics 4:1419 -1440, 2005.An explicit goal of proteomics is the identification and quantification of all the proteins expressed in a cell or tissue (1).Although not yet at the levels of data throughput and automation achieved in other genomic analyses such as DNA sequencing or microarray gene expression analysis, global protein profiling methods are rapidly evolving. This has been possible because of recent improvements in MS instrumentation, protein and peptide separation techniques, computational data analysis tools, and the availability of complete sequence databases for many species. As a result, analysis of complex protein mixtures using shotgun proteomics, a strategy based on the combination of protein digestion and MS/ MS-based peptide sequencing (2-4), has become widely adopted. The method allows protein identifications and, when combined with stable isotope labeling, quantification of the changes in the protein expression levels for hundreds of proteins in a single experiment (1).Compared with other MS-based proteomic technologies such as intact proteins sequencing (5, 6) or 2D 1 gel-based protein analysis (7), shotgun proteomic analysis has achieved a relatively high throughput. This is the result of a combination of several factors. Proteolytic digestion of proteins into shorter peptides simplifies MS/MS sequencing (peptides are easier to fragment in the mass spectrometer than intact proteins), whereas elimination of the 2D gel-based separation at the protein level simplifies sample handling and increases the overall data throughput. At the same time, computational analysis and interpretation of the data become more challenging (8 -13). The first and foremost computational challenge is the need to process large volumes of acquired MS/MS data with the purpose of identifying peptides that gave rise to observed spectra. This ch...
We present SAINT (Significance Analysis of INTeractome), a computational tool that assigns confidence scores to protein-protein interaction data generated using affinity-purification coupled to mass spectrometry (AP-MS). The method utilizes label-free quantitative data and constructs separate distributions for true and false interactions to derive the probability of a bona fide protein-protein interaction. We demonstrate that SAINT is applicable to data of different scales and protein connectivity and allows for the transparent analysis of AP-MS data.
Mammalian SWI/SNF [also called BAF (Brg/Brahma-associated factors)] ATP-dependent chromatin remodeling complexes are essential for formation of the totipotent and pluripotent cells of the early embryo. In addition, subunits of this complex have been recovered in screens for genes required for nuclear reprogramming in Xenopus and mouse embryonic stem cell (ES) morphology. However, the mechanism underlying the roles of these complexes is unclear. Here, we show that BAF complexes are required for the self-renewal and pluripotency of mouse ES cells but not for the proliferation of fibroblasts or other cells. Proteomic studies reveal that ES cells express distinctive complexes (esBAF) defined by the presence of Brg (Brahma-related gene), BAF155, and BAF60A, and the absence of Brm (Brahma), BAF170, and BAF60C. We show that this specialized subunit composition is required for ES cell maintenance and pluripotency. Our proteomic analysis also reveals that esBAF complexes interact directly with key regulators of pluripotency, suggesting that esBAF complexes are specialized to interact with ES cell-specific regulators, providing a potential explanation for the requirement of BAF complexes in pluripotency.BAF complexes ͉ BAF155 ͉ Brg E S cells are pluripotent cells capable of both limitless selfrenewal and differentiation into all embryonic lineages. These abilities are conferred by various mechanisms, including transcription factors (1-3), possibly Polycomb complexes (4, 5), microRNAs (6), and histone modification enzymes (7) that work in coordination to maintain the expression of pluripotency genes while repressing lineage-determinant genes. The involvement of such mechanisms in pluripotency has been investigated extensively in recent years (reviewed in ref. 8), but the role of chromatin remodeling enzymes remains unclear.The mammalian genome encodes about 30 SWI2/SNF2-like ATPases, which are assembled into SWI/SNF-like complexes with ATP-dependent chromatin remodeling activity. Of these, Brg and Brm are alternative ATPases of a family of 2-MDa multisubunit SWI/SNF or BAF complexes and make up the prototypic mammalian SWI/SNF-like chromatin remodeling complexes (9, 10). BAF complexes have been shown to be essential for many aspects of mammalian development (11-13). A role of BAF complexes in pluripotency is suggested by observations that deletion of Brg, BAF155 (or Srg3), and BAF47 (or hSNF5) all lead to peri-implantation lethality and failure of the totipotent cells that give rise to both the inner cell mass and trophoblast to survive and grow (14-16). The catalytic ATPase subunit, Brg, also was recovered in screens for factors essential for nuclear reprogramming (17) and to ES cell morphology (18). In addition, ES cells lacking BAF250 have defects in ES cell maintenance and differentiation (19,20). However, the mechanism by which BAF complexes help to establish and maintain pluripotency is not understood.In vitro, BAF complexes use energy generated from ATP hydrolysis to alter DNA-nucleosome contacts (21) and can also e...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.