2008
DOI: 10.1007/s12064-008-0032-1
|View full text |Cite
|
Sign up to set email alerts
|

Genes, information and sense: complexity and knowledge retrieval

Abstract: Information capacity of nucleotide sequences measures the unexpectedness of a continuation of a given string of nucleotides, thus having a sound relation to a variety of biological issues. A continuation is defined in a way maximizing the entropy of the ensemble of such continuations. The capacity is defined as a mutual entropy of real frequency dictionary of a sequence with respect to the one bearing the most expected continuations; it does not depend on the length of strings contained in a dictionary. Variou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(16 citation statements)
references
References 10 publications
0
16
0
Order By: Relevance
“…For W (3,1) N = M , and it is not so in general case. A frequency dictionary W q of nucleotide sequences is claimed to be an entity bearing a lot of information on that latter [15][16][17][18][19][20]. A consistent and comprehensive study of frequency dictionaries answers the questions concerning the statistical and information properties of DNA sequences.…”
Section: Frequency Dictionary and Genome Fragmentationmentioning
confidence: 99%
“…For W (3,1) N = M , and it is not so in general case. A frequency dictionary W q of nucleotide sequences is claimed to be an entity bearing a lot of information on that latter [15][16][17][18][19][20]. A consistent and comprehensive study of frequency dictionaries answers the questions concerning the statistical and information properties of DNA sequences.…”
Section: Frequency Dictionary and Genome Fragmentationmentioning
confidence: 99%
“…Formula (5) looks like a Markov process expression, while it is not: it is derived with no hypothesis towards the Markov property of an origin sequence (see [1][2][3][4][5][6]for details). Thus, another idea to figure out a lost string is to distinguish "inevitably lost" strings from "unexpectedly lost" ones.…”
Section: Discussionmentioning
confidence: 99%
“…A frequency dictionary W q of nucleotide sequences is claimed to be an entity bearing a lot of information on that latter [1][2][3][4][5][6]. A consistent and comprehensive study of frequency dictionaries answers the questions concerning the statistical and information properties of DNA sequences.…”
Section: Introductionmentioning
confidence: 99%
“…Some ab initio methods have been developed in the literature, such as k-mer [ 3 ], relative entropy [ 4 ], and information content [ 5 8 ]. In these methods, the frequency information of a word in a DNA sequence was used widely, but the position information was not paid enough attention.…”
Section: Introductionmentioning
confidence: 99%