2018
DOI: 10.1146/annurev-biodatasci-080917-013431
|View full text |Cite
|
Sign up to set email alerts
|

Alignment-Free Sequence Analysis and Applications

Abstract: Genome and metagenome comparisons based on large amounts of next generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been develop… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
72
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 89 publications
(78 citation statements)
references
References 170 publications
(268 reference statements)
0
72
0
Order By: Relevance
“…Also the use of complete genome/gene sequences is computationally intensive and is practically infeasible for scaling up for matching a large number of genetic sequences. [5].…”
Section: Effect Of Data Length On Complexity Of Sequencesmentioning
confidence: 99%
See 1 more Smart Citation
“…Also the use of complete genome/gene sequences is computationally intensive and is practically infeasible for scaling up for matching a large number of genetic sequences. [5].…”
Section: Effect Of Data Length On Complexity Of Sequencesmentioning
confidence: 99%
“…Thus, there is a need for fast alignment-free techniques for sequence analysis [3,4]. Further, one may have only short segments and/or incomplete fragments of nucleotide sequences to analyze [5]. Information theory and data compression algorithms provide a rich set of mathematical and algorithmic/computational tools to capture essential patterns in data that could be used for matching nucleotide sequences.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, a large number of alignment-free approaches to phylogeny reconstruction have been developed and applied, since these methods are much faster than traditional, alignment-based phylogenetic methods, see [51,39,3,25] for recent review papers and [50] for a systematic evaluation of alignment-free software tools. Most alignment-free approaches are based on k-mer statistics [21,44,7,48,17], but there are also approaches based on the length of common substrings [47,8,27,37,32,46], on word or spaced-word matches [38,33,35,34,1,41] or on so-called micro-alignments [49,20,29,28].…”
Section: Introductionmentioning
confidence: 99%
“…This approach has several limitations, including low sensitivity to detect remotely related species, slow computing, and low scalability (reviewed in (Zielezinski et al, 2017)). To overcome these limitations, several alignment-free methods were developed that are both computationally efficient and resistant to noise (Zielezinski et al, 2017;Ren et al, 2018;Jain et al, 2018). Based on their methodologies, there are three main categories of alignment-free genome similarity comparisons (reviewed in (Zielezinski et al, 2017)).…”
Section: Introductionmentioning
confidence: 99%