SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

Alser, Mohammed; Shahroodi, Taha; Gómez-Luna, Juan; Alkan, Can; Mutlu, Onur

doi:10.48550/arxiv.1910.09020

Cited by 4 publications

(20 citation statements)

References 30 publications

(58 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several recent works propose approaches and techniques to directly or indirectly accelerate or improve the accuracy of metagenomics pro ling, the rst step of such studies. ese works take three approaches: (1) Reducing the reference database's size by pre-alignment ltering [86,87] or heuristics for taxonomic classi cation techniques [55,[88][89][90][91], (2) Accelerating read alignment or assembly (only for alignment-/assembly-based pro lers) on CPUs, FPGAs, or GPUs [92][93][94][95][96][97][98], (3) post-alignment/-assembly/-classi cation presence and abundance estimation heuristics [54,55,99]. Demeter is categorized in the rst group, taking a HDC-based approach for the rst time.…”

Section: Metagenomic Pro Lersmentioning

confidence: 99%

Demeter: A Fast and Energy-Efficient Food Profiler using Hyperdimensional Computing in Memory

Shahroodi¹,

Zahedi²,

Fırtına³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Food pro ling is an essential step in any food monitoring system needed to prevent health risks and potential frauds in the food industry. Signi cant improvements in sequencing technologies are pushing food pro ling to become the main computational bo leneck. State-of-the-art pro lers are unfortunately too costly for food pro ling.Our goal is to design a food pro ler that solves the main limitations of existing pro lers, namely (1) working on massive data structures and (2) incurring considerable data movement, for a real-time monitoring system. To this end, we propose Demeter, the rst platform-independent framework for food pro ling. Demeter overcomes the rst limitation through the use of hyperdimensional computing (HDC) and e ciently performs the accurate few-species classi cation required in food pro ling. We overcome the second limitation by the use of an in-memory hardware accelerator for Demeter (named Acc-Demeter) based on memristor devices. Acc-Demeter actualizes several domain-speci c optimizations and exploits the inherent characteristics of memristors to improve the overall performance and energy consumption of Acc-Demeter.We compare Demeter's accuracy with other industrial food pro lers using detailed so ware modeling. We synthesize Acc-Demeter's required hardware using UMC's 65nm library by considering an accurate PCM model based on silicon-based prototypes. Our evaluations demonstrate that Acc-Demeter achieves a (1) throughput improvement of 192× and 724× and (2) memory reduction of 36× and 33× compared to Kraken2 and MetaCache (2 state-of-the-art pro lers), respectively, on typical food-related databases. Demeter maintains an acceptable pro ling accuracy (within 2% of existing tools) and incurs a very low area overhead.

show abstract

Section: Metagenomic Pro Lersmentioning

confidence: 99%

Demeter: A Fast and Energy-Efficient Food Profiler using Hyperdimensional Computing in Memory

Shahroodi¹,

Zahedi²,

Fırtına³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…To avoid examining dissimilar sequences at the downstream computationally-expensive read alignment step, a pre-alignment lter estimates the edit distance between every read and the regions of the reference at each read's candidate mapping locations, and uses this estimation to quickly decide whether or not read alignment is needed. If the sequences are dissimilar enough, signi cant amount of time is saved by avoiding the expensive alignment step [9,10,13,176,177].…”

Section: Genasm Frameworkmentioning

confidence: 99%

“…Examples of such lters are the Adjacency Filter [177] that is implemented for standard CPUs, SHD [176] that uses SIMD-capable CPUs, and GRIM-Filter [91] that is built in 3D-stacked memory. Many works also exploit the large amounts of parallelism o ered by FPGA architectures for pre-alignment ltering, such as Gate-Keeper [10], MAGNET [11], Shouji [9], and SneakySnake [13]. A recent work, GenCache [122], proposes an in-cache accelerator to improve the ltering (i.e., seeding) mechanism of GenAx (for short reads) by using in-cache operations [1] and software modi cations.…”

Section: Related Workmentioning

confidence: 99%

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

Cali¹,

Kalsi²,

Bingöl³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Genome sequence analysis has enabled signi cant advancements in medical and scienti c areas such as personalized medicine, outbreak tracing, and the understanding of evolution. To perform genome sequencing, devices extract small random fragments of an organism's DNA sequence (known as reads). The rst step of genome sequence analysis is a computational process known as read mapping. In read mapping, each fragment is matched to its potential location in the reference genome with the goal of identifying the original location of each read in the genome. Unfortunately, rapid genome sequencing is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. A major contributor to this bottleneck is approximate string matching (ASM), which is used at multiple points during the mapping process. ASM enables read mapping to account for sequencing errors and genetic variations in the reads.We propose GenASM, the rst ASM acceleration framework for genome sequence analysis. GenASM performs bitvectorbased ASM, which can e ciently accelerate multiple steps of genome sequence analysis. We modify the underlying ASM algorithm (Bitap) to signi cantly increase its parallelism and reduce its memory footprint. Using this modi ed algorithm, we design the rst hardware accelerator for Bitap. Our hardware accelerator consists of specialized systolic-array-based compute units and on-chip SRAMs that are designed to match the rate of computation with memory capacity and bandwidth, resulting in an e cient design whose performance scales linearly as we increase the number of compute units working in parallel.We demonstrate that GenASM provides signi cant performance and power bene ts for three di erent use cases in genome sequence analysis. First, GenASM accelerates read alignment for both long reads and short reads. For long reads, GenASM outperforms state-of-the-art software and hardware accelerators by 116× and 3.9×, respectively, while reducing power consumption by 37× and 2.7×. For short reads, GenASM outperforms state-of-the-art software and hardware accelerators by 111× and 1.9×. Second, GenASM accelerates pre-alignment ltering for short reads, with 3.7× the performance of a state-of-theart pre-alignment lter, while reducing power consumption by 1.7× and signi cantly improving the ltering accuracy. Third, GenASM accelerates edit distance calculation, with 22-12501× and 9.3-400× speedups over the state-of-the-art software library and FPGA-based accelerator, respectively, while reducing power consumption by 548-582× and 67×. We conclude that GenASM is a exible, high-performance, and low-power framework, and we brie y discuss four other use cases that can bene t from GenASM.

show abstract

“…This backtracking step involves irregular memory access patterns that are challenging for hardware implementation. Second, a few works [17,18] propose a filtering step before alignment, called pre-alignment filtering 1 , to significantly speed up the end-to-end sequence alignment of (long) reads by heuristically replacing the need for expensive DP solutions for many inputs in the first place. These filters use a pre-defined edit distance threshold between the inputs and quickly determine whether or not an alignment (i.e., DP) should be granted.…”

Section: Introductionmentioning

confidence: 99%

SieveMem: A Computation-in-Memory Architecture for Fast and Accurate Pre-Alignment

Shahroodi,

Miao,

Zahedi

et al. 2023

2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors (ASAP)

View full text Add to dashboard Cite

With the industry moving towards sequencing of accurate long reads (as they favor accurate and more efficient reconstruction of DNA), finding solutions that support efficient analysis of these reads becomes more necessary. The long execution time required for sequence alignment of long reads negatively affects genomic studies relying on sequence alignment. Although pre-alignment filtering as an extra step before alignment was recently introduced to mitigate sequence alignment for short reads, these filters do not work as efficiently for long reads. Moreover, even with efficient pre-alignment filters, the overall end-to-end (i.e., filtering + original alignment) execution time of alignment for long reads remains high, while the filtering step is now a major portion of the end-to-end execution time.Our paper makes three contributions. First, it identifies data movement of sequences between memory units and computing units as the main source of inefficiency for pre-alignment filters of long reads. This is because although filters reject many of these long sequencing pairs before they get to the alignment stage, they still require a huge cost regarding time and energy consumption for the large data transferred between memory and processor. Second, this paper introduces an adaptation of a short-read pre-alignment filtering algorithm suitable for long reads. We call this LongGeneGuardian. Finally, it presents Filter-Fuse as an architecture that supports LongGeneGuardian inside the memory. FilterFuse exploits the Computation-In-Memory computing paradigm, eliminating the cost of data movement in LongGeneGuardian.Our evaluations show that FilterFuse improves the execution time of filtering by 120.47× for long reads compared to State-ofthe-Art (SoTA) filter, SneakySnake. FilterFuse also improves the end-to-end execution time of sequence alignment by up to 49.14× and 5207.63× compared to SneakySnake with SoTA aligner and only SoTA aligner, respectively.

show abstract

SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

Cited by 4 publications

References 30 publications

Demeter: A Fast and Energy-Efficient Food Profiler using Hyperdimensional Computing in Memory

Demeter: A Fast and Energy-Efficient Food Profiler using Hyperdimensional Computing in Memory

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

SieveMem: A Computation-in-Memory Architecture for Fast and Accurate Pre-Alignment

Contact Info

Product

Resources

About