2016
DOI: 10.1007/s11227-016-1835-3
|View full text |Cite
|
Sign up to set email alerts
|

An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop

Abstract: Alignment-free methods are one of the mainstays of biological sequence comparison, i.e., the assessment of how similar two biological sequences are to each other, a fundamental and routine task in computational biology and bioinformatics. They have gained popularity since, even on standard desktop machines, they are faster than methods based on alignments. However, with the advent of Next-Generation Sequencing Technologies, datasets whose size, i.e., number of sequences and their total length, is a challenge t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
5

Relationship

2
8

Authors

Journals

citations
Cited by 25 publications
(15 citation statements)
references
References 41 publications
0
15
0
Order By: Relevance
“…To further speed-up ‘Multi-SpaM’, we have parallelized our software to run on multiple cores; in Table 1 , we report both wall-clock and ‘CPU’ run times. It should be straight-forward to adapt our software to run on distributed systems, as has been done for other alignment-free approaches ( 66 , 67 ).…”
Section: Discussionmentioning
confidence: 99%
“…To further speed-up ‘Multi-SpaM’, we have parallelized our software to run on multiple cores; in Table 1 , we report both wall-clock and ‘CPU’ run times. It should be straight-forward to adapt our software to run on distributed systems, as has been done for other alignment-free approaches ( 66 , 67 ).…”
Section: Discussionmentioning
confidence: 99%
“…Normalization of the background and including inexact matches increases the time complexity. Alignment-free methods based on k -mers lend themselves to parallel algorithms, and parallel computational methods have been applied to achieve speedup and scalability for alignment-free methods (90). When k is large, memory is a main limitation for storing k -mer counts and computing alignment-free measures (91).…”
Section: Applications Of the Alignment-free Methods To Comparative Gementioning
confidence: 99%
“…The modern high-throughput technologies produce high amounts of sequence collections of data, and several methodologies have been proposed for their efficient storage and analysis [34,35]. Recently, approaches based on MapReduce and big data technologies have been proposed (see, e.g., [36], and [3] for a complete review on this topic). An important issue in this context is the computation of k-mer statistics, that becomes challenging when sets of sequences at a genomic scale are involved.…”
Section: Big Data Based Approaches For the Analysis Of Biological Seqmentioning
confidence: 99%