2014
DOI: 10.1016/j.gene.2014.05.043
|View full text |Cite
|
Sign up to set email alerts
|

K-mer natural vector and its application to the phylogenetic analysis of genetic sequences

Abstract: Based on the well-known k-mer model, we propose a k-mer natural vector model for representing a genetic sequence based on the numbers and distributions of k-mers in the sequence. We show that there exists a one-to-one correspondence between a genetic sequence and its associated k-mer natural vector. The k-mer natural vector method can be easily and quickly used to perform phylogenetic analysis of genetic sequences without requiring evolutionary models or human intervention. Whole or partial genomes can be hand… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
51
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 60 publications
(54 citation statements)
references
References 50 publications
2
51
0
Order By: Relevance
“…In recent years, many studies have approached the investigation of DNA strings and genomes by means of algorithms, information theory and formal languages 11 12 13 14 15 16 17 18 19 20 21 22 , and methods were developed for investigating whole genome structures. In particular, dictionaries of words occurring in genomes, distributions defined over genomes, and concepts related to word occurrences and frequencies have been very useful and seem to characterize important genomic features relevant in biological contexts 23 24 25 26 27 28 29 30 . Dictionaries are, in essence, finite formal languages.…”
mentioning
confidence: 99%
“…In recent years, many studies have approached the investigation of DNA strings and genomes by means of algorithms, information theory and formal languages 11 12 13 14 15 16 17 18 19 20 21 22 , and methods were developed for investigating whole genome structures. In particular, dictionaries of words occurring in genomes, distributions defined over genomes, and concepts related to word occurrences and frequencies have been very useful and seem to characterize important genomic features relevant in biological contexts 23 24 25 26 27 28 29 30 . Dictionaries are, in essence, finite formal languages.…”
mentioning
confidence: 99%
“…With the help of the guide tree, StrainSeeker is able to detect the last common ancestor on the sub-species level. For the present work, we used an alignment-free k-mer-based distance method similar to the k-mer natural vector method [11,12] to construct the tree of 2,758 bacterial strains. To test the accuracy of our tree, the E. coli sp.…”
Section: Advantages and Limitations Of Guide Tree-based Strain Detectmentioning
confidence: 99%
“…Building the guide tree. We used k-mer-based alignment-free methods analogous to [11,12] to calculate the pairwise distance K between all pairs of genomes and to create the guide tree. All 2,758 available bacterial genomes from the NCBI RefSeq database (release 65) were used.…”
Section: Multi-locus Sequence Typing Of E Coli Samplesmentioning
confidence: 99%
“…Some people tried to use the k-mer frequencies of genome sequences to construct phylogenetic trees, but the results were not satisfactory. In order to get the consistency with the accepted phylogenetic tree, total kmer set had to be screened [34][35][36][37][38]. It is known that the ideal plan is by using the information of genomewide sequence to characterize genome evolution.…”
Section: Introductionmentioning
confidence: 99%