2020
DOI: 10.3389/fbioe.2020.556413
|View full text |Cite
|
Sign up to set email alerts
|

KITSUNE: A Tool for Identifying Empirically Optimal K-mer Length for Alignment-Free Phylogenomic Analysis

Abstract: Genomic DNA is the best "unique identifier" for organisms. Alignment-free phylogenomic analysis, simple, fast, and efficient method to compare genome sequences, relies on looking at the distribution of small DNA sequence of a particular length, referred to as k-mer. The k-mer approach has been explored as a basis for sequence analysis applications, including assembly, phylogenetic tree inference, and classification. Although this approach is not novel, selecting the appropriate k-mer length to obtain the optim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 55 publications
0
11
0
Order By: Relevance
“…Comparative genomics approaches (e.g. calculation of shared Kmer (Kmer overlap) [15,16], average nucleotide identity (ANI) [17], identification of genomic syntenic blocks [18]) have been increasingly utilized in taxonomic studies, aided by the development of lower cost high . CC-BY-ND 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.…”
Section: Introductionmentioning
confidence: 99%
“…Comparative genomics approaches (e.g. calculation of shared Kmer (Kmer overlap) [15,16], average nucleotide identity (ANI) [17], identification of genomic syntenic blocks [18]) have been increasingly utilized in taxonomic studies, aided by the development of lower cost high . CC-BY-ND 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.…”
Section: Introductionmentioning
confidence: 99%
“…Precise matching and alignment of typing schemes is required in the future to build on these results. Obtaining congruency to the genomic reference, particularly to either fast typing [ 37 ] or next generation sequencing methods employing single nucleotide polymorphism or k-mer [ 41 , 42 ] based classification approaches, not only allows for fast incorporation of novel isolates into a mass spectra database, but also significantly reduces false identifications and classification artifacts of unknown strains associated to such newly described clades or lineages.…”
Section: Discussionmentioning
confidence: 99%
“…The size of k being an odd number avoids palindromic reverse-complement sequences. Further work could involve benchmarking of optimal length k-mers against core-genome SNP distances [38,39].…”
Section: Discussionmentioning
confidence: 99%