Ramón Román-Roldán scite author profile

We study statistical properties of the Jensen-Shannon divergence D, which quantifies the difference between probability distributions, and which has been widely applied to analyses of symbolic sequences. We present three interpretations of D in the framework of statistical physics, information theory, and mathematical statistics, and obtain approximations of the mean, the variance, and the probability distribution of D in random, uncorrelated sequences. We present a segmentation method based on D that is able to segment a nonstationary symbolic sequence into stationary subsequences, and apply this method to DNA sequences, which are known to be nonstationary on a wide range of different length scales.

show abstract

Compositional segmentation and long-range fractal correlations in DNA sequences

Bernaola‐Galván

1996

View full text Add to dashboard Cite

A segmentation algorithm based on the Jensen-Shannon entropic divergence is used to decompose longrange correlated DNA sequences into statistically significant, compositionally homogeneous patches. By adequately setting the significance level for segmenting the sequence, the underlying power-law distribution of patch lengths can be revealed. Some of the identified DNA domains were uncorrelated, but most of them continued to display long-range correlations even after several steps of recursive segmentation, thus indicating a complex multi-length-scaled structure for the sequence. On the other hand, by separately shuffling each segment, or by randomly rearranging the order in which the different segments occur in the sequence, shuffled sequences preserving the original statistical distribution of patch lengths were generated. Both types of random sequences displayed the same correlation scaling exponents as the original DNA sequence, thus demonstrating that neither the internal structure of patches nor the order in which these are arranged in the sequence is critical; therefore, long-range correlations in nucleotide sequences seem to rely only on the power-law distribution of patch lengths.

show abstract

Finding Borders between Coding and Noncoding DNA Regions by an Entropic Segmentation Method

et al. 2000

View full text Add to dashboard Cite

We present a new computational approach to finding borders between coding and noncoding DNA. This approach has two features: (i) DNA sequences are described by a 12-letter alphabet that captures the differential base composition at each codon position, and (ii) the search for the borders is carried out by means of an entropic segmentation method which uses only the general statistical properties of coding DNA. We find that this method is highly accurate in finding borders between coding and noncoding regions and requires no "prior training" on known data sets. Our results appear to be more accurate than those obtained with moving windows in the discrimination of coding from noncoding DNA.

show abstract

Sequence Compositional Complexity of DNA through an Entropic Segmentation Method

1998

View full text Add to dashboard Cite

A new complexity measure, based on the entropic segmentation of DNA sequences into compositionally homogeneous domains, is proposed. Sequence compositional complexity (SCC) deals directly with the complex heterogeneity in nonstationary DNA sequences. The plot of SCC as a function of significance level provides a profile of sequence structure at different length scales. SCC is found to be higher in sequences with long-range correlation than those without, and higher in noncoding sequences than coding sequences. Furthermore, a general agreement is found between the SCC of the DNA sequence, on one hand, and the biological complexity of the organism, on the other, attributable to an increasingly complex organization of noncoding DNA over the course of evolution.[S0031-9007(97)05210-1]

show abstract

Study of statistical correlations in DNA sequences

Bernaola‐Galván

Carpena

Román-Roldán

et al. 2002

Gene

View full text Add to dashboard Cite

Isochore chromosome maps of eukaryotic genomes

Oliver

Bernaola‐Galván

Carpena

et al. 2001

Gene

View full text Add to dashboard Cite

Application of information theory to DNA sequence analysis: A review

Román-Roldán

Bernaola‐Galván

Oliver

1996

Pattern Recognition

View full text Add to dashboard Cite

SEGMENT: identifying compositional domains in DNA sequences

Oliver¹,

Román-Roldán²,

Pérez³

et al. 1999

View full text Add to dashboard Cite

Here we describe a heuristic segmentation algorithm for DNA sequences, which was implemented on a Windows program (SEGMENT). The program divides a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a sequence is partitioned into domains, a global measure of sequence compositional complexity (SCC), accounting for both the sizes and compositional biases of all the domains in the sequence, is derived. SEGMENT computes SCC as a function of the significance level, which provides a multiscale view of sequence complexity.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.