Families of FPGA-based accelerators for approximate string matching

Court, Tom Van; Herbordt, Martin C.

doi:10.1016/j.micpro.2006.04.001

Cited by 40 publications

(31 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In previous work we implemented a large number of variations of FPGA/DP [22]. Perhaps the most "vanilla" of these holds a query of size 150 and has an operating frequency of 40MHz.…”

Section: Implementation and Resultsmentioning

confidence: 99%

“…If the number of cells is greater than m, the size of the query string (see e.g. [22]), the FPGA algorithm runs in O(n). The constant is the time-per-character required to pump the database through the array.…”

Section: Fpga Algorithmsmentioning

confidence: 99%

“…Their drawbacks, which have prevented their adoption, are their brittleness and the lack of platforms available to the primary users. The first of these issues has been addressed in another recent study [22], while the latter is rapidly being addressed with the proliferation of FPGA-based computational platforms.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Single Pass, BLAST-Like, Approximate String Matching on FPGAs

Herbordt

Model

et al. 2006

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

View full text Add to dashboard Cite

Approximate string matching is fundamental to bioinformatics, and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST-and dynamic-programming-(DP) based methods. Our primary contributions are two new algorithms for emulating the seeding and extension phases of BLAST. These operate in a single pass through a database at streaming rate (110 Maa/sec on a VP70 for query sizes up to 600 and 170 Maa/sec on a Virtex4 for query sizes up to 1024), and with no preprocessing other than loading the query string. Further, they use very high sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations.

show abstract

“…In previous work we implemented a large number of variations of FPGA/DP [22]. Perhaps the most "vanilla" of these holds a query of size 150 and has an operating frequency of 40MHz.…”

Section: Implementation and Resultsmentioning

confidence: 99%

Section: Fpga Algorithmsmentioning

confidence: 99%

See 1 more Smart Citation

Single Pass, BLAST-Like, Approximate String Matching on FPGAs

Herbordt

Model

et al. 2006

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

View full text Add to dashboard Cite

show abstract

“…If the number of cells is greater than m, the size of the query string (see e.g. [25]), the FPGA algorithm runs in O(n). The constant is the time-per-character required to pump the database through the array.…”

Section: Fpga Algorithmsmentioning

confidence: 99%

“…Their drawbacks, which have prevented their adoption, are their brittleness and the lack of platforms available to the primary users. The first of these issues has been addressed in another recent study [25], while the latter is rapidly being addressed with the proliferation of FPGA-based computational platforms.…”

Section: Introductionmentioning

confidence: 99%

Single pass streaming BLAST on FPGAs

et al. 2007

Self Cite

View full text Add to dashboard Cite

Approximate string matching is fundamental to bioinformatics and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST-and dynamic-programming-(DP) based methods. Our primary contribution is a new algorithm for emulating the seeding and extension phases of BLAST. This operates in a single pass through a database at streaming rate, and with no preprocessing other than loading the query string. Moreover, it emulates parameters turned to maximum possible sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations showing order of magnitude acceleration over serial reference code. A simple extension assures compatibility with NCBI BLAST.

show abstract

Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU, GPU and FPGA

Zou

Dou

Xia

2011

Concurrency and Computation

View full text Add to dashboard Cite

With fierce competition between CPU and graphics processing unit (GPU) platforms, performance evaluation has become the focus of various sectors. In this paper, we take a well-known algorithm in the field of biosequence matching and database searching, the Smith-Waterman (S-W) algorithm as an example, and demonstrate approaches that fully exploit its performance potentials on CPU, GPU, and field-programmable gate array (FPGA) computing platforms. For CPU platforms, we perform two optimizations, single instruction, multiple data and multithread, with compiler options, to gain over 70 speedups over naive CPU versions on quad-core CPU platforms. For GPU platforms, we propose the combination of coalesced global memory accesses, shared memory tiles, and loop unfolding, achieving 50 speedups over initial GPU versions on an NVIDIA GeForce GTX 470 card. Experimental results show that the GPU GTX 470 gains 12 speedups, instead of 100 reported by some studies, over Intel quadcore CPU Q9400, under the same manufacturing technology and both with fully optimized schemes. In addition, for FPGA platforms, we customize a linear systolic array for the S-W algorithm in a 45-nm FPGA chip from Xilinx (XC6VLX760), with up to 1024 processing elements. Under only 133 MHz clock rate, the FPGA platform reaches the highest performance and becomes the most power-efficient platform, using only 25 W compared with 190 W of the GPU GTX 470. higher performance/power computation ratio. In addition to customized circuit structures, similar to application-specific integrated circuit (ASIC) chips, FPGA chips are reconfigurable (i.e., the behaviors of FPGA chips can be changed dynamically during run time). Hence, FPGA-reconfigurable computing platforms combine some of the flexibility of the software with the high performance of ASIC hardware.Despite many reports on the superiority of GPU or FPGA acceleration over CPU, there are still many open questions that cause confusion, including debates in some academic papers and website discussions [1][2][3][4][5]. In the website of NVIDIA (2701 San Tomas Expressway Santa Clara, CA 95050, USA.), a blog post [5] provides links to 10 users who have documented GPU performance increases of 100 or more. On FPGA acceleration, examples can be found in FCCM (International Symposium on Field-Programmable Custom Computing Machines) Proceedings [6], which achieve speedups of orders of magnitude versus CPU implementations. However, some obvious shortcomings in the above performance comparison have resulted in a misunderstanding of the features of three completely different computing architectures. First, the reported CPU performance results were measured from naive software versions without compiler optimization options. Second, calculating the execution time not counting the time for transferring initial data to accelerators or collecting results from accelerators. Third, neither single instruction, multiple data (SIMD) instructions nor cache optimizations [7] were utilized in the evaluation of CPU performance. In cont...

show abstract

Families of FPGA-based accelerators for approximate string matching

Cited by 40 publications

References 19 publications

Single Pass, BLAST-Like, Approximate String Matching on FPGAs

Single Pass, BLAST-Like, Approximate String Matching on FPGAs

Single pass streaming BLAST on FPGAs

Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU, GPU and FPGA

Contact Info

Product

Resources

About