A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics

Halloran, John T.; Rocke, David M.

doi:10.1021/acs.jproteome.7b00767

Cited by 9 publications

(14 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We screened out significant risk genes through feature selection and optimization. An SVM model was trained using ten-fold crossvalidation [18]. The SVM model is a supervised classification algorithm of machine learning.…”

Section: Construction Of Classification Model By Svmmentioning

confidence: 99%

Identification of risk genes related to myocardial infarction and the construction of early SVM diagnostic model

Song

Zheng

Xue

et al. 2021

International Journal of Cardiology

View full text Add to dashboard Cite

Section: Construction Of Classification Model By Svmmentioning

confidence: 99%

Identification of risk genes related to myocardial infarction and the construction of early SVM diagnostic model

Song

Zheng

Xue

et al. 2021

International Journal of Cardiology

View full text Add to dashboard Cite

“…The second bottleneck is the execution time required to learn SVM parameters. Recent work 11 has tackled this bottleneck through software optimizations to Percolator's SVM learning engine, and our efforts complement and further improve upon these optimizations. On a massive data set containing over 215 million PSMs, the new version of Percolator achieves an overall speedup of 439% (81.4 h down to 18.6 h).…”

Section: Contributionsmentioning

confidence: 99%

“…Finally, we optimized the CGLS solver itself using a mixture of low-level linear algebra function calls and software streamlining, as described previously. 11 Optimizations are compared against the recently described CGLS multithreaded speedup, 11 referred to as CGLS-par. In contrast to the second in our series of optimizations, which uses multiple threads to parallelize runs of CGLS at the crossvalidation level, CGLS-par instead uses multiple threads to parallelize computation within the CGLS algorithm.…”

Section: Software Optimizationmentioning

confidence: 99%

Speeding Up Percolator

et al. 2019

Self Cite

View full text Add to dashboard Cite

The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.

show abstract

“…Recent advances in machine learning tools and widespread use of high throughput techniques provides a massive amount of data as a source to develop tools for every step in MSbased workflows (Bouwmeester et al, 2020). For example, the post-processing tool Percolator (Käll et al, 2007;Halloran and Rocke, 2018) integrates several features into a semi-supervised learning algorithm to improve the distinction between true and false peptide-spectrum matches. Next to that, spectrum intensity predictors, such as MS 2 PIP (Degroeve et al, 2015;Gabriels et al, 2019) and Prosit (Gessulat et al, 2019) are new models that incorporate fragment ion intensities predictions as additional features next to the standard m/z ratio during spectral library searching to increase the resolution of the identification, even in challenging workflows such as proteogenomics (Verbruggen et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

Ion Mobility Coupled to a Time-of-Flight Mass Analyzer Combined With Fragment Intensity Predictions Improves Identification of Classical Bioactive Peptides and Small Open Reading Frame-Encoded Peptides

Peeters

Baggerman

Gabriels

et al. 2021

Front. Cell Dev. Biol.

View full text Add to dashboard Cite

Bioactive peptides exhibit key roles in a wide variety of complex processes, such as regulation of body weight, learning, aging, and innate immune response. Next to the classical bioactive peptides, emerging from larger precursor proteins by specific proteolytic processing, a new class of peptides originating from small open reading frames (sORFs) have been recognized as important biological regulators. But their intrinsic properties, specific expression pattern and location on presumed non-coding regions have hindered the full characterization of the repertoire of bioactive peptides, despite their predominant role in various pathways. Although the development of peptidomics has offered the opportunity to study these peptides in vivo, it remains challenging to identify the full peptidome as the lack of cleavage enzyme specification and large search space complicates conventional database search approaches. In this study, we introduce a proteogenomics methodology using a new type of mass spectrometry instrument and the implementation of machine learning tools toward improved identification of potential bioactive peptides in the mouse brain. The application of trapped ion mobility spectrometry (tims) coupled to a time-of-flight mass analyzer (TOF) offers improved sensitivity, an enhanced peptide coverage, reduction in chemical noise and the reduced occurrence of chimeric spectra. Subsequent machine learning tools MS2PIP, predicting fragment ion intensities and DeepLC, predicting retention times, improve the database searching based on a large and comprehensive custom database containing both sORFs and alternative ORFs. Finally, the identification of peptides is further enhanced by applying the post-processing semi-supervised learning tool Percolator. Applying this workflow, the first peptidomics workflow combined with spectral intensity and retention time predictions, we identified a total of 167 predicted sORF-encoded peptides, of which 48 originating from presumed non-coding locations, next to 401 peptides from known neuropeptide precursors, linked to 66 annotated bioactive neuropeptides from within 22 different families. Additional PEAKS analysis expanded the pool of SEPs on presumed non-coding locations to 84, while an additional 204 peptides completed the list of peptides from neuropeptide precursors. Altogether, this study provides insights into a new robust pipeline that fuses technological advancements from different fields ensuring an improved coverage of the neuropeptidome in the mouse brain.

show abstract

A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics

Cited by 9 publications

References 20 publications

Identification of risk genes related to myocardial infarction and the construction of early SVM diagnostic model

Identification of risk genes related to myocardial infarction and the construction of early SVM diagnostic model

Speeding Up Percolator

Ion Mobility Coupled to a Time-of-Flight Mass Analyzer Combined With Fragment Intensity Predictions Improves Identification of Classical Bioactive Peptides and Small Open Reading Frame-Encoded Peptides

Contact Info

Product

Resources

About