2014
DOI: 10.1186/s12859-014-0384-0
|View full text |Cite
|
Sign up to set email alerts
|

AKE - the Accelerated k-mer Exploration web-tool for rapid taxonomic classification and visualization

Abstract: BackgroundWith the advent of low cost, fast sequencing technologies metagenomic analyses are made possible. The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology.ResultsIn this paper we address the problem of rapid taxonomic assignment with small and adaptive data models (< 5 MB) and present the accelerated k-mer explorer (AKE). Acceleration in AKE’s taxonomic assignments is achieved by a special machine le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 29 publications
0
4
0
Order By: Relevance
“…Because of its ubiquitous usage for the field, various approaches for each of the domains exist and many of the applications incorporate parallelism. Furthermore, some applications exploiting algorithmic properties for fast computation exist which use e.g., k -mers instead of comparing string sequences, with some also using parallelization techniques (McHardy et al, 2007 ; Langenkämper et al, 2014 ). In database searches for instance, fast alignment algorithms are required that are capable to handle the increase in database sizes and the increase of queries to these databases.…”
Section: Methodsmentioning
confidence: 99%
“…Because of its ubiquitous usage for the field, various approaches for each of the domains exist and many of the applications incorporate parallelism. Furthermore, some applications exploiting algorithmic properties for fast computation exist which use e.g., k -mers instead of comparing string sequences, with some also using parallelization techniques (McHardy et al, 2007 ; Langenkämper et al, 2014 ). In database searches for instance, fast alignment algorithms are required that are capable to handle the increase in database sizes and the increase of queries to these databases.…”
Section: Methodsmentioning
confidence: 99%
“…Comparing these frequencies is computationally easier than sequence alignment, and is an important method in alignment-free sequence analysis. The k-mer-based method is implemented in tools such as CLARK [ 71 ] or Kraken [ 31 ], GC-content is used in TAC-ELM [ 72 ], and oligonucleotide frequencies are used in TACOA [ 73 ], MetaID [ 74 ], or AKE [ 75 ]. When building classifiers, these features could eventually be extended by estimated open reading frame (ORF) length or/and density, codon usage, motifs, or repeats, such as microsatellites, transposons, or CRISPRs (clustered regularly interspaced short palindromic repeats) that could help to differentiate viral from non-viral sequences.…”
Section: Data Analysis Pipeline Designmentioning
confidence: 99%
“…These features can be used as inputs for machine learning models trained to predict classifications such as the taxonomic designation associated with sequences (Solis-Reyes et al 2018). Machine learning models that operate on k-mer input features have previously been applied in DNA barcode sequence classification and other predictive tasks (Kuksa and Pavlovic 2009;Langenkämper et al 2014;Ainsworth et al 2017;Cordier et al 2017). The application of these tools is often limited to specific taxonomic classification tasks (Kuksa and Pavlovic 2009), or they rely on user-provided sets of sequence data for model training (Langenkämper et al 2014).…”
Section: Introductionmentioning
confidence: 99%