The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.
This paper describes an open-source system for analyzing, storing, and validating proteomics information derived from tandem mass spectrometry. It is based on a combination of data analysis servers, a user interface, and a relational database. The database was designed to store the minimum amount of information necessary to search and retrieve data obtained from the publicly available data analysis servers. Collectively, this system was referred to as the Global Proteome Machine (GPM). The components of the system have been made available as open source development projects. A publicly available system has been established, comprised of a group of data analysis servers and one main database server.
An algorithm for reducing the time necessary to match a large set of peptide tandem mass spectra with a list of protein sequences is described. This algorithm breaks the process into multiple steps. A rapid survey step identifies all protein sequences that are reasonable candidates for a match with a set of tandem mass spectra. These candidates are then used as models, which are refined by detailed analysis of the set of tandem mass spectra for evidence of incomplete enzymatic hydrolysis, non-specific hydrolysis and chemical modifications of amino acid residues resulting from either post-translational modifications or sample handling. Compared with current one-step methods for matching proteins to mass spectra, this multiple-step method can decrease the time required for the calculation by several orders of magnitude.
The proposed model is based on the measurement of the retention times of 346 tryptic peptides in the 560-to 4,000-Da mass range, derived from a mixture of 17 protein digests. These peptides were measured in HPLC-MALDI MS runs, with peptide identities confirmed by MS/MS. The model relies on summation of the retention coefficients of the individual amino acids, as in previous approaches, but additional terms are introduced that depend on the retention coefficients for amino acids at the N-terminal of the peptide. In the 17-protein mixture, optimization of two sets of coefficients, along with additional compensation for peptide length and hydrophobicity, yielded a linear dependence of retention time on hydrophobicity, with an R 2 value about 0.94. The predictive capability of the model was used to distinguish peptides with close m/z values and for detailed peptide mapping of selected proteins. Its applicability was tested on columns of different sizes, from nano-to narrow-bore, and for direct sample injection, or injection via a pre-column. It can be used for accurate prediction of retention times for tryptic peptides on reversed-phase (300-Å pore size) columns of different sizes with a linear water-ACN gradient and with TFA as the ion-pairing modifier. Molecular & Cellular Proteomics 3:908 -919, 2004.The application of MS to biomolecular analysis has revolutionized protein research within the past decade (1). This can be mostly attributed to the development of ionization techniques that are compatible with biomolecules, i.e. MALDI (2, 3) and ESI (4), as well as improved instrumentation. However, although modern mass spectrometers provide high mass accuracy and sensitivity, the protein complexity and concentration range usually found in biological samples still present a challenge. The problem has been traditionally attacked by separation of complex protein mixtures by two-dimensional gel electrophoresis, with subsequent protein in-gel digestion, followed by ESI or MALDI MS. This remains one of the most popular sample preparation procedures, especially suitable for protein identification and quantitation. However, the method is best suited for higher abundance proteins with masses greater than 12-14 kDa, and some categories of molecules, such as membrane proteins (1) or species with extremes in isoelectric points, are handled poorly. There are also difficulties in adapting the method to high-throughput applications.Alternative analytical approaches are based on pre-fractionation of protein mixtures or cell lysates before the final MS steps of analysis (5-9). This often involves proteolytic digestion, followed by one-or multi-dimensional chromatographic separation of the resulting peptides, with subsequent detection by MS/MS. Such a method may yield considerable simplification of the problem, because the fractions from on-or off-line HPLC separations have reduced complexity compared with the original sample. Indeed, the combination of HPLC-ESI (MS or MS/MS) has proved to be a "work horse" for large-scale high-throug...
A system for creating a library of tandem mass spectra annotated with corresponding peptide sequences was described. This system was based on the annotated spectra currently available in the Global Proteome Machine Database (GPMDB). The library spectra were created by averaging together spectra that were annotated with the same peptide sequence, sequence modifications, and parent ion charge. The library was constructed so that experimental peptide tandem mass spectra could be compared with those in the library, resulting in a peptide sequence identification based on scoring the similarity of the experimental spectrum with the contents of the library. A software implementation that performs this type of library search was constructed and successfully used to obtain sequence identifications. The annotated tandem mass spectrum libraries for the Homo sapiens, Mus musculus, and Saccharomyces cerevisiae proteomes and search software were made available for download and use by other groups.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.