2009
DOI: 10.1103/physreve.79.061911
|View full text |Cite
|
Sign up to set email alerts
|

Quantitative measure of randomness and order for complete genomes

Abstract: We propose an order index, phi, which gives a quantitative measure of randomness and order of complete genomic sequences. It maps genomes to a number from 0 (random and of infinite length) to 1 (fully ordered) and applies regardless of sequence length. The 786 complete genomic sequences in GenBank were found to have phi values in a very narrow range, phig=0.031(-0.015)+0.028. We show this implies that genomes are halfway toward being completely random, or, at the "edge of chaos." We further show that artificia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

1
8
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 40 publications
1
8
0
Order By: Relevance
“…This interests us given a genome is a computing device made of nucleic acid that is the product of evolution. The overall position of all the human populations supports a controversial concept from complex systems science [55,57-61] that genomes are poised at or close to ‘The Edge of Chaos.’ This conclusion resonates closely with that of Kong et al, [62] who analysed 384 prokaryotic and 402 eukaryotic genomes using an novel regularity/order index called ø and based on averages of nucleotide distributions in a given sequence of pre-defined length.Figure 10 also summarise the possible mechanistic explanations for the various trajectories taken by the populations and individuals through information space, based on considerations of both the implications of our data modelling coupled with the real world mammalian genomes. We see different spatial impacts of LD and extent of outbreeding depending on the particular population under consideration.…”
Section: Discussionsupporting
confidence: 89%
“…This interests us given a genome is a computing device made of nucleic acid that is the product of evolution. The overall position of all the human populations supports a controversial concept from complex systems science [55,57-61] that genomes are poised at or close to ‘The Edge of Chaos.’ This conclusion resonates closely with that of Kong et al, [62] who analysed 384 prokaryotic and 402 eukaryotic genomes using an novel regularity/order index called ø and based on averages of nucleotide distributions in a given sequence of pre-defined length.Figure 10 also summarise the possible mechanistic explanations for the various trajectories taken by the populations and individuals through information space, based on considerations of both the implications of our data modelling coupled with the real world mammalian genomes. We see different spatial impacts of LD and extent of outbreeding depending on the particular population under consideration.…”
Section: Discussionsupporting
confidence: 89%
“…Although the fundamental role of sequence duplication in genome evolution is long appreciated [17,[31][32][33], the first quantitative model-independent characterization of which we are aware is the recent work of Lee and coworkers on 'genomic equivalence length' [34] based on the study of m-mer entropy (m ≤ 9) of modern natural genomes, which stresses the dominant role of duplication in genome growth. There exist other characterizations that may be more readily interpreted in terms of currently understood biological sequence type and function [13,14], but they are model-dependent and may therefore not always be the most suitable tools for discovery of novel repeats and functional elements.…”
Section: A Duplication Is Fundamental To Genome Evolutionmentioning
confidence: 99%
“…Lee et al also investigated models of genome growth with monoscale (δ-function) duplication lengths [34]; however, they did not study the distributions of longer mmers, which constitute the focus of our work in general and this manuscript in particular; the calculations described above appear to eliminate monoscale duplication models as candidates for neutral evolution of most natural genomes. Lee et al characterize nature as "the blind plagiarizer," duplicating sequences within a genome at random and letting selection determine what sticks.…”
Section: A Duplication Is Fundamental To Genome Evolutionmentioning
confidence: 99%
“…Moreover, an aspect that is missing in classical Shannon’s conceptual apparatus is relevant in our approach: random strings and pseudo-random generation algorithms, which now can be easily produced and analyzed 34 . In fact, it is natural to assume that the complexity of a genome increases with its “distance” from randomness 35 36 , as identified by means of a suitable comparison between the genome under investigation and random genomes of the same length. This idea alone provides important clues about the correct k -mer length to consider in our genome analyses, because theoretical and experimental analyses show that random genomes reach their entropic maxima for k -mers of length lg 2 ( n ), where n is the genome length.…”
mentioning
confidence: 99%