2019
DOI: 10.1101/611137
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benchmarking of alignment-free sequence comparison methods

Abstract: 2 Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
4

Relationship

2
8

Authors

Journals

citations
Cited by 31 publications
(46 citation statements)
references
References 95 publications
0
46
0
Order By: Relevance
“…For such data, standard two-phase methods that first compute an alignment and then compute a tree do not have acceptable accuracy, while PASTA [32], BAli-Phy [33], and other co-estimation methods are not fast. It is possible that alignment-free methods (see [34,35,36] for an entry into this topic) might provide good starting trees, but these have not been tested on ultra-large datasets (with thousands of species), and have instead mainly been focused on genome-scale analyses of tens of genomes. However, for any large dataset on which the starting trees cannot be reasonably accurately estimated quickly, blended DTM divide-and-conquer strategies may provide the best accuracy.…”
Section: Discussionmentioning
confidence: 99%
“…For such data, standard two-phase methods that first compute an alignment and then compute a tree do not have acceptable accuracy, while PASTA [32], BAli-Phy [33], and other co-estimation methods are not fast. It is possible that alignment-free methods (see [34,35,36] for an entry into this topic) might provide good starting trees, but these have not been tested on ultra-large datasets (with thousands of species), and have instead mainly been focused on genome-scale analyses of tens of genomes. However, for any large dataset on which the starting trees cannot be reasonably accurately estimated quickly, blended DTM divide-and-conquer strategies may provide the best accuracy.…”
Section: Discussionmentioning
confidence: 99%
“…While variants are typically discovered with short reads by mapping them to a target reference genome, one can also directly compare common subsequences among samples (Zielezinski et al, 2019) .…”
Section: Introductionmentioning
confidence: 99%
“…• Alignment based methods: These involve either shifting or insertion of gaps in sequences for alignment of two or more sequences, which make these methods computationally intensive. • Alignment-free methods: These are computationally less intensive methods that consider the genome sequences as character strings and use distance-based methods involving frequency and distribution of bases [12][13][14]. Our focus in this paper is on alignment-free methodology, especially on using complexity measures for sequence comparisons.…”
Section: Genome Sequence Comparisonmentioning
confidence: 99%