A method to correlate the uninterpreted tandem mass spectra of peptides produced under low energy (lo-50 eV) collision conditions with amino acid sequences in the Genpept database has been developed. In this method the protein database is searched to identify linear amino acid sequences within a mass tolerance of * 1 u of the precursor ion molecular weight. A cross-correlation function is then used to provide a measurement of similarity between the mass-to-charge ratios for the fragment ions predicted from amino acid sequences obtained from the database and the fragment ions observed in the tandem mass spectrum. In general, a difference greater than 0.1 between the normalized cross-correlation functions of the first-and second-ranked search results indicates a successfol match between sequence and spectrum. Searches of species-specific protein databases with tandem mass spectra acquired from peptides obtained from the enzymatically digested total proteins of E. coli and S. cerevisiae cells allowed matchmg of the spectra to amino acid sequences within proteins of these organisms. The approach described in this manuscript provides a convenient method to interpret tandem mass spectra with known sequences in a protein database, fJ Am Sot Mass Spectrom 1994, 5, 976-989) A mino acid sequence analysis is often the initial step in characterizing a newly isolated protein.Conventional sequencing strategies employ chemical reagents to remove one amino acid at a time from the amino terminus followed by isolation and analysis of the released amino acid derivative [l, 21. Limitations in the chemical efficiency of the process prevents determination of the complete sequence of a protein from small quantities of sample. Partial sequence information, however, can be used to search a protein or nucleotide database to discover relationships to previously identified proteins or to determine if the protein sequence is novel 13, 41. Although sequence information may have been determined previously, the context in which the protein is identified may be relevant to the biological process under study [51. Another method to identify known protein sequences employs site-specific proteolysis followed by measurement of the mass-to-charge ratios of the pep tides by mass spectrometry. The set of observed peptide mass-to-charge ratios is then used to search a protein database to find a set of peptide masses predicted from enzymatic digestion of each protein in the database [6-101. Both chemical degradation and peptide mapping approaches require the use of fairly homogeneous samples to avoid ambiguity in assigning Address reprint requests to John R.
We describe a largely unbiased method for rapid and large-scale proteome analysis by multidimensional liquid chromatography, tandem mass spectrometry, and database searching by the SEQUEST algorithm, named multidimensional protein identification technology (MudPIT). MudPIT was applied to the proteome of the Saccharomyces cerevisiae strain BJ5460 grown to mid-log phase and yielded the largest proteome analysis to date. A total of 1,484 proteins were detected and identified. Categorization of these hits demonstrated the ability of this technology to detect and identify proteins rarely seen in proteome analysis, including low-abundance proteins like transcription factors and protein kinases. Furthermore, we identified 131 proteins with three or more predicted transmembrane domains, which allowed us to map the soluble domains of many of the integral membrane proteins. MudPIT is useful for proteome analysis and may be specifically applied to integral membrane proteins to obtain detailed biochemical information on this unwieldy class of proteins.
We describe a rapid, sensitive process for comprehensively identifying proteins in macromolecular complexes that uses multidimensional liquid chromatography (LC) and tandem mass spectrometry (MS/MS) to separate and fragment peptides. The SEQUEST algorithm, relying upon translated genomic sequences, infers amino acid sequences from the fragment ions. The method was applied to the Saccharomyces cerevisiae ribosome leading to the identification of a novel protein component of the yeast and human 40S subunit. By offering the ability to identify >100 proteins in a single run, this process enables components in even the largest macromolecular complexes to be analyzed comprehensively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.