2012
DOI: 10.12921/cmst.2012.18.01.5-10
|View full text |Cite
|
Sign up to set email alerts
|

A Method for Nucleotide Sequence Analysis

Abstract: Symbolic sequence decomposition into a set of consecutive, distinct subsequences (mers) is presented. Several statistical distributions of nucleotide subsequences are defined and analysed. Sequence entropy and similarity between sequences in terms of mer lengths distribution are defined. An alignment-free method of phylogenetic tree construction is proposed.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 9 publications
0
10
0
Order By: Relevance
“…In [12] the total number of mers was considered as the main result of the decomposition procedure and considered as a measure of complexity of the sequence. In [13] it was shown that the spectrum appears to be a very rich resource of information on the symbolic sequence.…”
Section: Similarity Representative Set and Membership Measurementioning
confidence: 99%
See 1 more Smart Citation
“…In [12] the total number of mers was considered as the main result of the decomposition procedure and considered as a measure of complexity of the sequence. In [13] it was shown that the spectrum appears to be a very rich resource of information on the symbolic sequence.…”
Section: Similarity Representative Set and Membership Measurementioning
confidence: 99%
“…When spectra S1 and S2 of two sequences and are known, some natural similarity measures can be defined. In [13] the normalised number of a common set of mers…”
Section: Similaritymentioning
confidence: 99%
“…In the k-tuple method, a genetic sequence is represented by a frequency vector of fixed length subsequence and the similarity or dissimilarity measures are found based on frequency vector of sub-sequences [6]. The probabilistic methods represent the sequences using the transition matrix of a Markov chain of a pre-specified order and comparison of two sequences is done by finding the distance between two transition matrices [7,8].…”
Section: Introductionmentioning
confidence: 99%
“…The algorithm provides a better quantitative measure of complexity which is defined as the total number of strings. In my paper [3] the Ke and Tong algorithm was generalized to arbitrary sequences over a finite alphabet. It was also shown that the whole set of strings (not only their number) is a very rich source of information on symbolic sequences.…”
Section: Introductionmentioning
confidence: 99%
“…Neither of them provides a simple, satisfactory measure of global similarity between two arbitrary symbolic sequences over the same alphabet. An alternative similarity measure based on decomposition of sequences into a set of specific distinct words was proposed in [3]. The similarity between two sequences is related to the number of common words in the decomposition of two sequences.…”
Section: Introductionmentioning
confidence: 99%