The function of a large percentage of proteins is modulated by post-translational modifications (PTMs). Currently, mass spectrometry (MS) is the only proteome-wide technology that can identify PTMs. Unfortunately, the inability to detect a PTM by MS is not proof that the modification is not present. The detectability of peptides varies significantly making MS potentially blind to a large fraction of peptides. Learning from published algorithms that generally focus on predicting the most detectable peptides we developed a tool that incorporates protein abundance into the peptide prediction algorithm with the aim to determine the detectability of every peptide within a protein. We tested our tool, "Peptide Prediction with Abundance" (PPA), on in-house acquired as well as published data sets from other groups acquired on different instrument platforms. Incorporation of protein abundance into the prediction allows us to assess not only the detectability of all peptides but also whether a peptide of interest is likely to become detectable upon enrichment. We validated the ability of our tool to predict changes in protein detectability with a dilution series of 31 purified proteins at several different concentrations. PPA predicted the concentration dependent peptide detectability in 78% of the cases correctly, demonstrating its utility for predicting the protein enrichment needed to observe a peptide of interest in targeted experiments. This is especially important in the analysis of PTMs. PPA is available as a web-based or executable package that can work with generally applicable defaults or retrained from a pilot MS data set.
Post-translational modification (PTM)1 of proteins is a key regulatory mechanism in the vast majority of biological processes. Historically, to follow PTMs, site-specific antibodies had to be generated in a time-consuming and laborious process associated with high failure rates. Mass spectrometry (MS) holds enormous promise in PTM analysis as it is currently the only technique that has the ability to both discover, localize, and quantify proteome-wide modifications (1). Recent advances in instrumentation and method optimization makes it possible to detect the complete yeast proteome within one hour (2), an ever increasing proportion of the human proteome (3-6), and more than 10,000 phosphorylation sites in a single MS experiment (7,8). As a result one of the major publicly available databases (www.phosphosite.org (9)) has curated Ͼ200,000 phosphorylation sites.Although the number of proteins and PTMs that can be identified is impressive, many modifications have still not been identified in any MS-based experiment. The identification and quantification of biologically relevant modifications is challenging for three reasons: (1) many proteins of interest are of very low abundance rendering them difficult to detect and quantify; (2) many modifications sites are present at substoichiometric quantities, further reducing their detectability; and (3) as large scale proteomics is based on the detect...