1998
DOI: 10.1103/physrevlett.80.1344
|View full text |Cite
|
Sign up to set email alerts
|

Sequence Compositional Complexity of DNA through an Entropic Segmentation Method

Abstract: A new complexity measure, based on the entropic segmentation of DNA sequences into compositionally homogeneous domains, is proposed. Sequence compositional complexity (SCC) deals directly with the complex heterogeneity in nonstationary DNA sequences. The plot of SCC as a function of significance level provides a profile of sequence structure at different length scales. SCC is found to be higher in sequences with long-range correlation than those without, and higher in noncoding sequences than coding sequences.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
88
1
3

Year Published

2001
2001
2014
2014

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 79 publications
(95 citation statements)
references
References 32 publications
3
88
1
3
Order By: Relevance
“…During the past ten years, there has been intense discussion about the existence, the nature and the origin of LRC in DNA sequences. Besides Fourier and autocorrelation analysis, different techniques including mutual information functions [61,71,77,78], DNA walk representation [64,75,79,80,81,82,83,84], Zipf analysis [85,86,87] and entropies [88,89,90,91] were used for statistical analysis of DNA sequences. A lot of effort has been spent to adress rather struggling questions.…”
Section: Introductionmentioning
confidence: 99%
“…During the past ten years, there has been intense discussion about the existence, the nature and the origin of LRC in DNA sequences. Besides Fourier and autocorrelation analysis, different techniques including mutual information functions [61,71,77,78], DNA walk representation [64,75,79,80,81,82,83,84], Zipf analysis [85,86,87] and entropies [88,89,90,91] were used for statistical analysis of DNA sequences. A lot of effort has been spent to adress rather struggling questions.…”
Section: Introductionmentioning
confidence: 99%
“…Chose a second random number r ∈ [0, 1] and compare it to the value of the exit distance distribution q P U,n or q P Y,n depending on the current state on the chain. 4. If r ≤ q P U,n (or r ≤ q P Y,n ) then the sequence is extended by n units of P U (or P Y ).…”
Section: A Model Dnamentioning
confidence: 99%
“…The observed complexity of these nested sequences has been shaped during evolutionary time based on functional needs. Processes like single nucleotide mutations, insertion and deletion of segments, multiple repetitions of elements acting simultaneously over different length and time scales have shaped the complexity of current day genomes producing intriguing statistical properties [3][4][5][6][7][8]. In this latter setting early investigations have shown that the succession of bases along coding regions in higher organisms presents short range correlations, whereas non-coding regions exhibit long-range correlations [9][10][11].…”
Section: Introductionmentioning
confidence: 99%
“…We therefore do not introduce further segment boundaries. the first approach, the Jensen-Shannon divergences of new segment boundaries are tested for statistical significance against various χ 2 distributions with the appropriate degrees of freedom (Bernaola-Galván et al, 1996;Román-Roldán et al, 1998). When no new segment boundaries are more significant than the chosen confidence level p, the recursive segmentation terminates.…”
Section: Segmentationmentioning
confidence: 99%
“…To find the unknown segment boundaries t i,m i separating segments m i and m i + 1, we use the recursive segmentation scheme introduced by Bernaola-Galván et al (1996) and Román-Roldán et al (1998). In this segmentation scheme, we check how likely it is for the point t in the time series x = (x 1 , .…”
Section: Segmentationmentioning
confidence: 99%