Identification of post-translational modifications (PTMs) is important to understanding the biological functions of proteins. MS/MS is a useful tool to identify
Most proteins undergo PTMs1 at multiple sites. The types and sites of PTMs in a protein vary widely and affect its cellular functions. Identification of all PTMs present in a protein is a key step toward understanding its biological functions and interactions inside a cell (1, 2). MS/MS (3, 4) allows rapid identification of many types of PTMs. However, data analysis and interpretation of MS/MS spectra for identification of PTMs remain a major challenge.Early approaches to PTM identification using MS/MS involved exhaustive searches of all possible combinations of PTMs for each peptide from a protein database (5, 6). Because the search space grows exponentially as the number of PTMs increases, these early approaches performed a restrictive search that takes into account only a few types of PTMs during data analysis, ignoring all others. Investigators were obliged to guess the PTMs expected to exist in a sample prior to a search, and many potentially important PTMs may have been overlooked.Various new approaches have been developed to increase the number of PTMs that can be identified during data analyses. VEMS (7) introduced an improved algorithm to reduce the search space, OpenSea (8) implemented a mass-based sequence alignment between database peptides and de novo interpretation, and TwinPeaks (9) improved the basic scoring scheme of SEQUEST (5), a popular database search program. But none of these approaches fully addressed the current limitations in the number of PTMs. A few tools were recently introduced for blind PTM search. MS-Alignment (10) predicts PTMs expected in a sample by spectral alignment between a database peptide and a spectrum followed by InsPecT (11) search. ModifiComb (12) introduced a ⌬M histogram between unassigned spectra and base peptides found in a database. These blind approaches predict PTMs based on the frequency of mass shifts (indicating potential PTMs) in a sample. Thus, they all have the intrinsic weakness of missing rare PTMs infrequently observed that might provide important clues to understanding the function of a protein. Although many approaches have been developed to take into account many types of PTMs, most assume that there will be a single variable PTM per peptide and ignore multiply modified peptides. On the contrary, our studies with human glyceraldehyde-3-phosphate dehydrogenase (GAPDH) showed that there are many multiply modified peptides in a biological sample.Here we describe a new algorithm, named MOD i , that identifies multiple PTMs in a peptide while placing virtually no limit From the ‡Department