2019
DOI: 10.1109/tcbb.2017.2760829
|View full text |Cite
|
Sign up to set email alerts
|

Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems

Abstract: Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer ind… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(18 citation statements)
references
References 43 publications
0
17
0
Order By: Relevance
“…Since the key features of fastv rely on unique k-mer mapping and extension, it is important to obtain high quality unique k-mer sets for microorganisms of interest. Although a number of k-mer generation tools are currently available [30,31], none are suitable for our application because we must both generate unique k-mers for tens of thousands of viruses and/or microorganisms, and filter the k-mer keys based on the reference genome. These unmet needs have led us to develop UniqueKMER, a new unique k-mer generation tool.…”
Section: Uniquekmer: Efficient Unique K-mer Generation For Large Datamentioning
confidence: 99%
“…Since the key features of fastv rely on unique k-mer mapping and extension, it is important to obtain high quality unique k-mer sets for microorganisms of interest. Although a number of k-mer generation tools are currently available [30,31], none are suitable for our application because we must both generate unique k-mers for tens of thousands of viruses and/or microorganisms, and filter the k-mer keys based on the reference genome. These unmet needs have led us to develop UniqueKMER, a new unique k-mer generation tool.…”
Section: Uniquekmer: Efficient Unique K-mer Generation For Large Datamentioning
confidence: 99%
“…Indeed, the availability of an arbitrary number of independent computation nodes allows to virtually extend to any size the data structure used to keep the k-mer statistics in memory, while using the network as a temporary buffer between the extraction phase and the aggregation phase. This is the approach followed by Kmernator [42] and Kmerind [43]. Both these tools are developed as MPI-based parallel applications and are able to handle data sets whose size is proportional to the overall memory of the MPIbased system where they are run.…”
Section: Distributed Systemsmentioning
confidence: 99%
“…K-mer counting has been extensively studied over the past decade [8,[58][59][60][61][62][63][64][65]. Counting is accomplished mainly through incremental updates to hash tables [8,58,64,65], including hash based probabilistic data data structures [60][61][62] such as Bloom Filters [66] and Countmin Sketch [67], or through sorting and aggregation [59,63].…”
Section: Use Case and Related Workmentioning
confidence: 99%
“…Intra-task parallelism is achieved generally via concurrent updates of a shared data structure [8,60], while inter-task parallelism via data partitioning followed by sequential computation for each partition. Partitioning minimizes subsequent synchronization and may occur on disk [58,59,63,64], or in memory [59,61,63,65]. Many-core accelerators, such as GPGPU, may also be employed [64] for compute intensive phases.…”
Section: Use Case and Related Workmentioning
confidence: 99%
See 1 more Smart Citation