Saliva is a body fluid with important functions in oral and general health. A consortium of three research groups catalogued the proteins in human saliva collected as the ductal secretions: 1166 identifications-914 in parotid and 917 in submandibular/sublingual saliva-were made. The results showed that a high proportion of proteins that are found in plasma and/or tears are also present in saliva along with unique components. The proteins identified are involved in numerous molecular processes ranging from structural functions to enzymatic/catalytic activities. As expected, the majority mapped to the extracellular and secretory compartments. An immunoblot approach was used to validate the presence in saliva of a subset of the proteins identified by mass spectrometric approaches. These experiments focused on novel constituents and proteins for which the peptide evidence was relatively weak. Ultimately, information derived from the work reported here and related published studies can be used to translate blood-based clinical laboratory tests into a format that utilizes saliva. Additionally, a catalogue of the salivary proteome of healthy individuals allows future analyses of salivary samples from individuals with oral and systemic diseases, with the goal of identifying biomarkers with diagnostic and/or prognostic value for these conditions; another possibility is the discovery of therapeutic targets.
ProLuCID, a new algorithm for peptide identification using tandem mass spectrometry and protein sequence databases has been developed. This algorithm uses a three tier scoring scheme. First, a binomial probability is used as a preliminary scoring scheme to select candidate peptides. The binomial probability scores generated by ProLuCID minimize molecular weight bias and are independent of database size. A modified cross-correlation score is calculated for each candidate peptide identified by the binomial probability. This cross-correlation scoring function models the isotopic distributions of fragment ions of candidate peptides which ultimately results in higher sensitivity and specificity than that obtained with the SEQUEST XCorr. Finally, ProLuCID uses the distribution of XCorr values for all of the selected candidate peptides to compute a Z score for the peptide hit with the highest XCorr. The ProLuCID Z score combines the discriminative power of XCorr and DeltaCN, the standard parameters for assessing the quality of the peptide identification using SEQUEST, and displays significant improvement in specificity over ProLuCID XCorr alone. ProLuCID is also able to take advantage of high resolution MS/MS spectra leading to further improvements in specificity when compared to low resolution tandem MS data. A comparison of filtered data searched with SEQUEST and ProLuCID using the same false discovery rate as estimated by a target-decoy database strategy, shows that ProLuCID was able to identify as many as 25% more proteins than SEQUEST. ProLuCID is implemented in Java and can be easily installed on a single computer or a computer cluster.
DAF-2, an insulin receptor-like protein, regulates metabolism, development, and aging in Caenorhabditis elegans. In a quantitative proteomic study, we identified 86 proteins that were more or less abundant in long-lived daf-2 mutant worms than in wild-type worms. Genetic studies on a subset of these proteins indicated that they act in one or more processes regulated by DAF-2, including entry into the dauer developmental stage and aging. In particular, we discovered a compensatory mechanism activated in response to reduced DAF-2 signaling, which involves the protein phosphatase calcineurin.
Database searching is an essential element of large-scale proteomics. Because these methods are widely used, it is important to understand the rationale of the algorithms. Most algorithms are based on concepts first developed in SEQUEST and PeptideSearch. Four basic approaches are used to determine a match between a spectrum and sequence: descriptive, interpretative, stochastic and probability-based matching. We review the basic concepts used by most search algorithms, the computational modeling of peptide identification and current challenges and limitations of this approach for protein identification.An unintended consequence of whole-genome sequencing has been the birth of large-scale proteomics. What drives proteomics is the ability to use mass spectrometry data of peptides as an 'address' or 'zip code' to locate proteins in sequence databases. Two mass spectrometry methods are used to identify proteins by database search methods. The first method uses a molecular weight fingerprint measured from a protein digested with a site-specific protease [1][2][3][4][5] . A second method uses tandem mass spectra derived from individual peptides of a digested protein 6,7 (Fig. 1). Because each tandem mass spectrum represents an independent and verifiable piece of data, this approach to database searching has the ability to identify proteins in mixtures, enabling a rapid and comprehensive approach for the analysis of protein complexes and other complicated mixtures of proteins 6,[8][9][10][11][12] . New biology has been discovered based on fast and accurate protein identification [13][14][15][16][17][18] . As tandem mass spectral protein identification has proliferated, it has become increasingly important to understand the rationale of individual database search algorithms, their relative strengths and weaknesses, and the mathematics used to match sequence to spectrum.In this review we discuss the prevailing fragmentation models, spectral preprocessing, methods to match tandem mass spectra to sequences and several approaches to matching tandem mass spectra of peptides whose exact sequences may not be present in the database. Space limitations restrict a detailed description of all algorithms in this rapidly expanding field. Also, some algorithms are proprietary, and thus, details on how they work are unknown. This review should supplement and update earlier reviews on database search algorithms [19][20][21][22][23][24] . Peptide fragmentation and data preprocessingIn tandem mass spectrometry (MS/MS), gas phase peptide ions undergo collision-induced dissociation (CID) with molecules of an inert gas such as helium or argon 25 . Other methods of dissociation have been developed, such as electron capture dissociation (ECD), surface induced dissociation (SID) and electron transfer dissociation (ETD), but gas-phase CID is the most widely used in commercial tandem mass spectrometers. The dissociation pathways are strongly dependent on the collision energy, but the vast majority of instruments use low-energy CID (<100 eV) 26 ....
As the speed with which proteomic labs generate data increases along with the scale of projects they are undertaking, the resulting data storage and data processing problems will continue to challenge computational resources. This is especially true for shotgun proteomic techniques that can generate tens of thousands of spectra per instrument each day. One design factor leading to many of these problems is caused by storing spectra and the database identifications for a given spectrum as individual files. While these problems can be addressed by storing all of the spectra and search results in large relational databases, the infrastructure to implement such a strategy can be beyond the means of academic labs. We report here a series of unified text file formats for storing spectral data (MS1 and MS2) and search results (SQT) that are compact, easily parsed by both machine and humans, and yet flexible enough to be coupled with new algorithms and data-mining strategies.
We carried out a test sample study to try to identify errors leading to irreproducibility, including incompleteness of peptide sampling, in LC-MS-based proteomics. We distributed a test sample consisting of an equimolar mix of 20 highly purified recombinant human proteins, to 27 laboratories for identification. Each protein contained one or more unique tryptic peptides of 1250 Da to also test for ion selection and sampling in the mass spectrometer. Of the 27 labs, initially only 7 labs reported all 20 proteins correctly, and only 1 lab reported all the tryptic peptides of 1250 Da. Nevertheless, a subsequent centralized analysis of the raw data revealed that all 20 proteins and most of the 1250 Da peptides had in fact been detected by all 27 labs. The centralized analysis allowed us to determine sources of problems encountered in the study, which include missed identifications (false negatives), environmental contamination, database matching, and curation of protein identifications. Improved search engines and databases are likely to increase the fidelity of mass spectrometry-based proteomics.
An optimization and comparison of trypsin digestion strategies for peptide/protein identifications by microLC-MS/MS with or without MS compatible detergents in mixed organic-aqueous and aqueous systems was carried out in this study. We determine that adding MS-compatible detergents to proteolytic digestion protocols dramatically increases peptide and protein identifications in complex protein mixtures by shotgun proteomics. Protein solubilization and proteolytic efficiency are increased by including MS-compatible detergents in trypsin digestion buffers. A modified trypsin digestion protocol incorporating the MS compatible detergents consistently identifies over 300 proteins from 5 microg of pancreatic cell lysates and generates a greater number of peptide identifications than trypsin digestion with urea when using LC-MS/MS. Furthermore, over 700 proteins were identified by merging protein identifications from trypsin digestion with three different MS-compatible detergents. We also observe that the use of mixed aqueous and organic solvent systems can influence protein identifications in combinations with different MS-compatible detergents. Peptide mixtures generated from different MS-compatible detergents and buffer combinations show a significant difference in hydrophobicity. Our results show that protein digestion schemes incorporating MS-compatible detergents generate quantitative as well as qualitative changes in observed peptide identifications, which lead to increased protein identifications overall and potentially increased identification of low-abundance proteins.
DTASelect provides a means by which complex SEQUEST results can be filtered, organized, and viewed. A single sample may produce tens of thousands of tandem mass spectra. Manually perusing and selecting SEQUEST matches among such a mass of data carries a risk of inconsistency. DTASelect allows the user to set complex criteria for acceptance or rejection of individual spectrum results. It also features rules for dealing with multiple, identical peptide matches and for removing proteins that are insufficiently evidenced. It provides its sorted and filtered summary as HTML and text documents for easy review and also offers several auxiliary reports. DTASelect is a powerful tool for automatic analysis of complex mixture tandem mass spectrometry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.