We describe pLink 2, a search engine with higher speed and reliability for proteome-scale identification of cross-linked peptides. With a two-stage open search strategy facilitated by fragment indexing, pLink 2 is ~40 times faster than pLink 1 and 3~10 times faster than Kojak. Furthermore, using simulated datasets, synthetic datasets,
15
N metabolically labeled datasets, and entrapment databases, four analysis methods were designed to evaluate the credibility of ten state-of-the-art search engines. This systematic evaluation shows that pLink 2 outperforms these methods in precision and sensitivity, especially at proteome scales. Lastly, re-analysis of four published proteome-scale cross-linking datasets with pLink 2 required only a fraction of the time used by pLink 1, with up to 27% more cross-linked residue pairs identified. pLink 2 is therefore an efficient and reliable tool for cross-linking mass spectrometry analysis, and the systematic evaluation methods described here will be useful for future software development.
Disulfide bonds are vital for protein functions, but locating the linkage sites has been a challenge in protein chemistry, especially when the quantity of a sample is small or the complexity is high. In 2015, our laboratory developed a sensitive and efficient method for mapping protein disulfide bonds from simple or complex samples (Lu et al. in Nat Methods 12:329, 2015). This method is based on liquid chromatography–mass spectrometry (LC–MS) and a powerful data analysis software tool named pLink. To facilitate application of this method, we present step-by-step disulfide mapping protocols for three types of samples—purified proteins in solution, proteins in SDS-PAGE gels, and complex protein mixtures in solution. The minimum amount of protein required for this method can be as low as several hundred nanograms for purified proteins, or tens of micrograms for a mixture of hundreds of proteins. The entire workflow—from sample preparation to LC–MS and data analysis—is described in great detail. We believe that this protocol can be easily implemented in any laboratory with access to a fast-scanning, high-resolution, and accurate-mass LC–MS system.
High-throughput proteomics based on mass spectrometry (MS) analysis has permeated biomedical science and propelled numerous research projects. pFind 3 is a database search engine for high-speed and in-depth proteomics data analysis. pFind 3 features a swift open search workflow that is adept at uncovering less obvious information such as unexpected modifications or mutations that would have gone unnoticed using a conventional data analysis pipeline. In this protocol, we provide step-by-step instructions to help users mastering various types of data analysis using pFind 3 in conjunction with pParse for data pre-processing and if needed, pQuant for quantitation. This streamlined pParse-pFind-pQuant workflow offers exceptional sensitivity, precision, and speed. It can be easily implemented in any laboratory in need of identifying peptides, proteins, or post-translational modifications, or of quantitation based on
15
N-labeling, SILAC-labeling, or TMT/iTRAQ labeling.
In cross-linking mass spectrometry,
the identification of cross-linked
peptide pairs heavily relies on the ability of a database search engine
to measure the similarities between experimental and theoretical MS/MS
spectra. However, the lack of accurate ion intensities in theoretical
spectra impairs the performance of search engines, in particular,
on proteome scales. Here we introduce pDeepXL, a deep neural network
to predict MS/MS spectra of cross-linked peptide pairs. To train pDeepXL,
we used the transfer-learning technique because it facilitated the
training with limited benchmark data of cross-linked peptide pairs.
Test results on more than ten data sets showed that pDeepXL accurately
predicted the spectra of both noncleavable DSS/BS3/Leiker cross-linked
peptide pairs (>80% of predicted spectra have Pearson’s r values higher than 0.9) and cleavable DSSO/DSBU cross-linked
peptide pairs (>75% of predicted spectra have Pearson’s r values higher than 0.9). pDeepXL also achieved the accurate
prediction on unseen data sets using an online fine-tuning technique.
Lastly, integrating pDeepXL into a database search engine increased
the number of identified cross-link spectra by 18% on average.
When it comes to mass spectrometry data analysis for identification of peptide pairs linked by N-hydroxysuccinimide (NHS) ester cross-linkers, search engines bifurcate in their setting of cross-linkable sites. Some restrict NHS ester cross-linkable sites to lysine (K) and protein N-terminus, referred to as K only for short, whereas others additionally include serine (S), threonine (T), and tyrosine (Y) by default. Here, by setting amino acids with chemically inert side chains such as glycine (G), valine (V), and leucine (L) as cross-linkable sites, which serves as a negative control, we show that software-identified STY-cross-links are only as reliable as GVL-cross-links. This is true across different NHS ester cross-linkers including DSS, DSSO, and DSBU, and across different search engines including MeroX, xiSearch, and pLink. Using a published data set originated from synthetic peptides, we demonstrate that STY-cross-links indeed have a high false discovery rate. Further analysis revealed that depending on the data and the search engine used to analyze the data, up to 65% of the STY-cross-links identified are actually K−K cross-links of the same peptide pairs, up to 61% are actually K-mono-links, and the rest tend to contain short peptides at high risk of false identification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.