2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2021
DOI: 10.1109/ipdps49936.2021.00061
|View full text |Cite
|
Sign up to set email alerts
|

Distributed-Memory k-mer Counting on GPUs

Abstract: A fundamental step in many bioinformatics computations is to count the frequency of fixed-length sequences, called k-mers, a problem that has received considerable attention as an important target for shared memory parallelization. With datasets growing at an exponential rate, distributed memory parallelization is becoming increasingly critical. Existing distributed memory k-mer counters do not take advantage of GPUs for accelerating computations. Additionally, they do not employ domain-specific optimizations … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 35 publications
(56 reference statements)
0
1
0
Order By: Relevance
“…Host-Device communication is automatically handled by MetaHipMer, as k-mers are batched into compressed variants called supermers [21]. These supermers allow for efficient communication of k-mers across the network, and one kernel call is applied per supermer to unpack and insert the stored k-mers into the device hash table.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Host-Device communication is automatically handled by MetaHipMer, as k-mers are batched into compressed variants called supermers [21]. These supermers allow for efficient communication of k-mers across the network, and one kernel call is applied per supermer to unpack and insert the stored k-mers into the device hash table.…”
Section: Discussionmentioning
confidence: 99%
“…This hashing helps achieve good load balance across processes. A special minimizer-based scheme is used to reduce the communication volume [21]. Using GPUs boosts performance, but further constrains memory (e.g.…”
Section: Datasetmentioning
confidence: 99%
See 2 more Smart Citations
“…Indeed, a naive indexing algorithm with the running time O(| Genome |* k ) becomes prohibitively slow in the case of accurate HiFi reads because mapping these reads is based on large k -mer sizes (e.g., k = 300). Jellyfish ( Marçais and Kingsford 2011 ), KMC3 ( Kokot et al 2017 ), and more scalable GPU–based k -mer counting approaches ( Nisa et al 2021 ) generate a database of counts that allows a constant-time count query for any k -mer. However, even though one can rapidly generate a counting database, existing implementations for indexing all rare k -mers still require O(| Genome |* k ) time.…”
mentioning
confidence: 99%