CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

Liu, Yongchao; Maskell, Douglas L.; Schmidt, Bertil

doi:10.1186/1756-0500-2-73

Cited by 224 publications

(191 citation statements)

References 21 publications

Supporting

Mentioning

183

Contrasting

Unclassified

Order By: Relevance

“…Several alternative implementations for accelerating the Smith-Waterman algorithm using FPGAs ( [14], [15]), vec- tor operations on x86 CPUs [16], and GPUs (e.g., using CUDA [17]) exist. However, because the PaPaRa alignment kernel differs significantly from the standard SmithWaterman implementation, we omit a more detailed review at this point.…”

Section: Related Workmentioning

confidence: 99%

Accelerating Phylogeny-Aware Short DNA Read Alignment with FPGAs

Alachiotis

Berger

Stamatakis

2011

2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines

View full text Add to dashboard Cite

Abstract-Recent advances in molecular sequencing technology have given rise to novel algorithms for simultaneously aligning short sequence reads to reference sequence alignments and corresponding evolutionary reference trees. We present a complete hardware/software implementation for the acceleration of a program called PaPaRa, a newly introduced dynamic programming algorithm for this purpose.We verify the correctness of the proposed architecture on a real FPGA and introduce a straight-forward communication protocol (using gigabit ethernet) for seamless integration with the encapsulating steering software that is executed on a PC processor. The hardware description and the software implementation are freely available for download.When mapped to a Virtex 6 FPGA, our reconfigurable architecture can compute 133.4 billion cell updates per second for the novel, tree-based alignment kernel of PaPaRa. Compared to PaPaRa, running on a 3.2GHz Intel Core i5 CPU, we obtain speedups for the alignment kernel, that range between 170 and 471. For the entire application, that is, the alignment kernel and the trace-back step, we obtain speedups between 74 and 118.

show abstract

Section: Related Workmentioning

confidence: 99%

Accelerating Phylogeny-Aware Short DNA Read Alignment with FPGAs

Alachiotis

Berger

Stamatakis

2011

2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines

View full text Add to dashboard Cite

show abstract

“…Device Database Performance searched (Liu, Schmidt, Voss, Schroder & Muller-Wittig, 2006) GTX 7800 Swiss-Prot 650 MCUPS (Liu, Huang, Johnson & Vaidya, 2006) GTX 7800 983 protein 241 MCUPS sequences (Manavski & Valle, 2008) GTX 8800 Swiss-Prot 1.9 GCUPS (Liu et al, 2009) GTX 280 Swiss-Prot 9.5 GCUPS (Liu et al, 2010) GTX 280 Swiss-Prot 17 GCUPS (Kentie, 2010) GTX 275 Swiss-Prot 21.4 GCUPS …”

Section: Methodsmentioning

confidence: 99%

“…Furthermore, it is shown to scale almost linearly with the amount of GPUs used by simply splitting up the database. Various improvements have been suggested to the approach presented in (Manavski & Valle, 2008), as shown in (Akoglu & Striemer, 2009;Liu et al, 2009). In (Liu et al, 2009), for sequences of more than 3,072 amino acids an 'inter-task parallelization' method similar to the systolic array and OpenGL approaches is used as this, while slower, requires less memory.…”

Section: Current Implementationsmentioning

confidence: 99%

“…Various improvements have been suggested to the approach presented in (Manavski & Valle, 2008), as shown in (Akoglu & Striemer, 2009;Liu et al, 2009). In (Liu et al, 2009), for sequences of more than 3,072 amino acids an 'inter-task parallelization' method similar to the systolic array and OpenGL approaches is used as this, while slower, requires less memory. The 'CUDASW++' solution presented in (Liu et al, 2009) manages a maximum speed of about 9.5 GCUPS searching Swiss-Prot on a Geforce GTX 280 graphics card.…”

Section: Current Implementationsmentioning

confidence: 99%

“…In (Liu et al, 2009), for sequences of more than 3,072 amino acids an 'inter-task parallelization' method similar to the systolic array and OpenGL approaches is used as this, while slower, requires less memory. The 'CUDASW++' solution presented in (Liu et al, 2009) manages a maximum speed of about 9.5 GCUPS searching Swiss-Prot on a Geforce GTX 280 graphics card. An improved version, 'CUDASW++ 2.0' has been published recently (Liu et al, 2010).…”

Section: Current Implementationsmentioning

confidence: 99%

See 2 more Smart Citations