2019
DOI: 10.1093/bioinformatics/btz147
|View full text |Cite
|
Sign up to set email alerts
|

SW-Tandem: a highly efficient tool for large-scale peptide identification with parallel spectrum dot product on Sunway TaihuLight

Abstract: Summary Tandem mass spectrometry based database searching is a widely acknowledged and adopted method that identifies peptide sequence in shotgun proteomics. However, database searching is extremely computationally expensive, which can take days even weeks to process a large spectra dataset. To address this critical issue, this paper presents SW-Tandem, a new tool for large-scale peptide sequencing. SW-Tandem parallelizes the spectrum dot product scoring algorithm and leverages the advantages… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 7 publications
0
11
0
Order By: Relevance
“…CC-BY-NC-ND 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 29, 2020. ; https://doi.org/10.1101/2020.10.29.359075 doi: bioRxiv preprint acquisition time to a few minute range [12][13][14] and increasing the throughput of data processing [15][16][17][18][19][20] .…”
Section: Introductionmentioning
confidence: 99%
“…CC-BY-NC-ND 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 29, 2020. ; https://doi.org/10.1101/2020.10.29.359075 doi: bioRxiv preprint acquisition time to a few minute range [12][13][14] and increasing the throughput of data processing [15][16][17][18][19][20] .…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, solving these computational problems requires data structure(s) such as hash-table, graphs, and sparse unstructured matrix computations that exhibit little memory-locality and naturally lead to unpredictable communication patterns both in space (arbitrary connections between computing components) and time (the processes or threads may need data from one another at any point in time). Further, the existing high-performance computing methods [198], [199], [200] have been designed over inherently serial designs where the database is replicated on all parallel nodes, and the experimental data are split among them. This strategy is not scalable due to the space complexity associated with indexing proteome databases with multiple PTMs (specially fragment-ion index) [201], or multiple proteome database searches are required for systems biology experiments.…”
Section: A Proteogenomic Toolsmentioning
confidence: 99%
“…These studies include Parallel Tandem [198] which spawns multiple instances of the original X!Tandem on distributed machines; X! !Tandem [209] achieves parallelism using owner-compute MPI processes; MR-Tandem [210] uses Map-Reduce instead of MPI for better speedup efficiency; MCtandem [211] employs Intel Many Integrated Core (MIC) architecture co-processor to speedup spectral dot products (SDP) for X!Tandem, and SW-Tandem [199] employs the Haswell AVX2 engine to speedup SDP computations on Sunway Taihulight supercomputer. SW-Tandem also spawns a manager process that distributes the experimental data to worker processes using a global queue for better load balancing.…”
Section: A Limitationmentioning
confidence: 99%
“…As demonstrated by other big data fields [23], such limitations can be reduced by developing parallel algorithms that combine the computational power of thousands of processing elements across distributed-memory clusters, and supercomputers. We, and others have developed high-performance computing (HPC) techniques for processing of MS data including for multicore [3], [2], [10], [9], and distributed-memory architectures [24], [25] [26], [27], [28], [29]. Similar to serial algorithms, the objective of these HPC methods has been to speed up the arithmetic scoring part of the search engines, by spawning multiple (managed) instances of the original code, replicating the theoretical database, and splitting the experimental data.…”
Section: Mainmentioning
confidence: 99%