Unconventional epitopes presented by HLA class I complexes are emerging targets for T cell targeted immunotherapies. Their identification by mass spectrometry (MS) required development of novel methods to cope with the large number of theoretical candidates. Methods to identify post‐translationally spliced peptides led to a broad range of outcomes. We here investigated the impact of three common database search engines – that is, Mascot, Mascot+Percolator, and PEAKS DB – as final identification step, as well as the features of target database on the ability to correctly identify non‐spliced and cis‐spliced peptides. We used ground truth datasets measured by MS to benchmark methods’ performance and extended the analysis to HLA class I immunopeptidomes. PEAKS DB showed better precision and recall of cis‐spliced peptides and larger number of identified peptides in HLA class I immunopeptidomes than the other search engine strategies. The better performance of PEAKS DB appears to result from better discrimination between target and decoy hits and hence a more robust FDR estimation, and seems independent to peptide and spectrum features here investigated.
Noncanonical epitopes presented by Human Leucocyte Antigen class I (HLA-I) complexes to CD8+ T cells attracted the spotlight in the research of novel immunotherapies against cancer, infection and autoimmunity. Proteasomes, which are the main producers of HLA-I-bound antigenic peptides, can catalyze both peptide hydrolysis and peptide splicing. The prediction of proteasome-generated spliced peptides is an objective that still requires a reliable (and large) database of non-spliced and spliced peptides produced by these proteases. Here, we present an extended database of proteasome-generated spliced and non-spliced peptides, which was obtained by analyzing in vitro digestions of 80 unique synthetic polypeptide substrates, measured by different mass spectrometers. Peptides were identified through invitroSPI method, which was validated through in silico and in vitro strategies. The peptide product database contains 16,631 unique peptide products (5,493 non-spliced, 6,453 cis-spliced and 4,685 trans-spliced peptide products), and a substrate sequence variety that is a valuable source for predictors of proteasome-catalyzed peptide hydrolysis and splicing. Potential artefacts and skewed results due to different identification and analysis strategies are discussed.
The discovery of many noncanonical peptides detectable with sensitive mass spectrometry inside, outside, and on cells shepherded the development of novel methods for their identification, often not supported by a systematic benchmarking with other methods. We here propose iBench, a bioinformatic tool that can construct ground truth proteomics datasets and cognate databases, thereby generating a training court wherein methods, search engines, and proteomics strategies can be tested, and their performances estimated by the same tool. iBench can be coupled to the main database search engines, allows the selection of customized features of mass spectrometry spectra and peptides, provides standard benchmarking outputs, and is open source. The proof‐of‐concept application to tryptic proteome digestions, immunopeptidomes, and synthetic peptide libraries dissected the impact that noncanonical peptides could have on the identification of canonical peptides by Mascot search with rescoring via Percolator (Mascot+Percolator).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.