Distributed-Memory k-mer Counting on GPUs

Nisa, Israt; Pandey, Prashant; Ellis, Marquita; Oliker, Leonid; Buluç, Aydın; Yelick, Katherine

doi:10.1109/ipdps49936.2021.00061

Cited by 6 publications

(5 citation statements)

References 35 publications

(56 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Host-Device communication is automatically handled by MetaHipMer, as k-mers are batched into compressed variants called supermers [21]. These supermers allow for efficient communication of k-mers across the network, and one kernel call is applied per supermer to unpack and insert the stored k-mers into the device hash table.…”

Section: Discussionmentioning

confidence: 99%

“…This hashing helps achieve good load balance across processes. A special minimizer-based scheme is used to reduce the communication volume [21]. Using GPUs boosts performance, but further constrains memory (e.g.…”

Section: Datasetmentioning

confidence: 99%

“…HPDA applications can greatly benefit from using filters on the GPU. For example, k-mer analysis is the very first data processing step in MetaHipMer and numerous other pipelines in computational biology [4,21]. In k-mer analysis, we start by parsing the raw sequencing data into length-k subsequences (called k-mers) and counting the occurrences of each k-mer using a GPU-based hash table.…”

Section: Introductionmentioning

confidence: 99%

“…In k-mer analysis, we start by parsing the raw sequencing data into length-k subsequences (called k-mers) and counting the occurrences of each k-mer using a GPU-based hash table. K -mer counting is used to weed out erroneous data (singleton k-mers) caused by sequencing errors, estimate sequencing depth, prepare sequencing data for assembly, and many other downstream tasks [21]. A filter is often used in k-mer counting to separate out singleton k-mers [18,20] before inserting them in the hash table.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Singleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale κ-mer Analysis

McCoy,

Hofmey,

Yelick

et al. 2023

SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23)

View full text Add to dashboard Cite

Traditional filter data structures, such as Bloom filters, do not offer necessary features that modern high-performance data analytics applications need in order to efficiently perform complex data analysis tasks. For example, MetaHip-Mer, a de novo metagenome assembler, can use filters to weed out singleton k-mers and reduce memory usage by 30%-70%. However, the filter needs the ability to associate values with k-mers in order to perform the analysis in a single communication pass. Bloom filters do not support value associations and cause the application to perform an extra communication pass, thereby increasing the run time. Therefore, MetaHipMer faces a trade off between memory and speed due to the limited capabilities of traditional filters.In this paper, we overcome the memory and speed trade off in MetaHipMer by integrating a GPU-based feature-rich filter, the Two-Choice filter (TCF), in the MetaHipMer pipeline. The TCF uses key-value association to approximately store k-mers with extensions. This allows MetaHipMer to perform k-mer analysis on the GPUs in a single communication pass. Our empirical analysis shows a 50% reduction in memory usage in k-mer analysis on each node in MetaHipMer without any effect on the overall run time or assembly quality. The memory reduction in turn results in a 43% reduction in the number of nodes required to assemble datasets and enables MetaHipMer to scale to much larger datasets.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Datasetmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Singleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale κ-mer Analysis

McCoy,

Hofmey,

Yelick

et al. 2023

SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23)

View full text Add to dashboard Cite

show abstract

“…Indeed, a naive indexing algorithm with the running time O(| Genome |* k ) becomes prohibitively slow in the case of accurate HiFi reads because mapping these reads is based on large k -mer sizes (e.g., k = 300). Jellyfish ( Marçais and Kingsford 2011 ), KMC3 ( Kokot et al 2017 ), and more scalable GPU–based k -mer counting approaches ( Nisa et al 2021 ) generate a database of counts that allows a constant-time count query for any k -mer. However, even though one can rapidly generate a counting database, existing implementations for indexing all rare k -mers still require O(| Genome |* k ) time.…”

mentioning

confidence: 99%

Fast and accurate mapping of long reads to complete genome assemblies with VerityMap

2022

View full text Add to dashboard Cite

Recent advancements in long-read sequencing have enabled the telomere-to-telomere (complete) assembly of a human genome and are now contributing to the haplotype-resolved complete assemblies of multiple human genomes. Since the accuracy of read mapping tools deteriorates in highly-repetitive regions, there is a need to develop accurate, error-exposing (detecting potential assembly errors), and diploid-aware (distinguishing different haplotypes) tools for read mapping in complete assemblies. We describe the first accurate, error-exposing, and partially diploid-aware VerityMap tool for long-read mapping to complete assemblies.

show abstract