We study statistical properties of the Jensen-Shannon divergence D, which quantifies the difference between probability distributions, and which has been widely applied to analyses of symbolic sequences. We present three interpretations of D in the framework of statistical physics, information theory, and mathematical statistics, and obtain approximations of the mean, the variance, and the probability distribution of D in random, uncorrelated sequences. We present a segmentation method based on D that is able to segment a nonstationary symbolic sequence into stationary subsequences, and apply this method to DNA sequences, which are known to be nonstationary on a wide range of different length scales.
In this paper, we propose a new approach to analyze the conductivity-concentration data of ionic surfactant solutions, in the context of the determination of micellization parameters such as critical micelle concentration and degree of counterion dissociation. The method is based on the fit of the experimental raw data to a simple nonlinear function obtained by direct integration of a Boltzmann type sigmoidal function. The advantages of this procedure as compared to that most commonly used, namely, the intersection of the data regression lines above and below the critical micelle concentration and those employing the differentiation of the experimental data, are demonstrated by means of Monte Carlo simulations combined with nonlinear fits based on the Levenberg-Marquardt algorithm. The proposed method applied well to real systems that present a very gradual transition from the premicellar to the postmicellar region, in which the break of the conductivity-concentration plots is usually hard to determine.
A segmentation algorithm based on the Jensen-Shannon entropic divergence is used to decompose longrange correlated DNA sequences into statistically significant, compositionally homogeneous patches. By adequately setting the significance level for segmenting the sequence, the underlying power-law distribution of patch lengths can be revealed. Some of the identified DNA domains were uncorrelated, but most of them continued to display long-range correlations even after several steps of recursive segmentation, thus indicating a complex multi-length-scaled structure for the sequence. On the other hand, by separately shuffling each segment, or by randomly rearranging the order in which the different segments occur in the sequence, shuffled sequences preserving the original statistical distribution of patch lengths were generated. Both types of random sequences displayed the same correlation scaling exponents as the original DNA sequence, thus demonstrating that neither the internal structure of patches nor the order in which these are arranged in the sequence is critical; therefore, long-range correlations in nucleotide sequences seem to rely only on the power-law distribution of patch lengths.
We introduce a segmentation algorithm to probe the temporal organization of heterogeneities in human heartbeat interval time series. We find that the lengths of segments with different local mean heart rates follow a power-law distribution and show that this scale-invariant structure is not a simple consequence of the long-range correlations present in the data. The differences in mean heart rates between consecutive segments display a common functional form, but with different parameters for healthy individuals and for heart-failure patients. These findings suggest that there is relevant physiological information hidden in the heterogeneities of the heartbeat time series.
According to Bloch's theorem, electronic wavefunctions in perfectly ordered crystals are extended, which implies that the probability of finding an electron is the same over the entire crystal. Such extended states can lead to metallic behaviour. But when disorder is introduced in the crystal, electron states can become localized, and the system can undergo a metal-insulator transition (also known as an Anderson transition). Here we theoretically investigate the effect on the physical properties of the electron wavefunctions of introducing long-range correlations in the disorder in one-dimensional binary solids, and find a correlation-induced metal-insulator transition. We perform numerical simulations using a one-dimensional tight-binding model, and find a threshold value for the exponent characterizing the long-range correlations of the system. Above this threshold, and in the thermodynamic limit, the system behaves as a conductor within a broad energy band; below threshold, the system behaves as an insulator. We discuss the possible relevance of this result for electronic transport in DNA, which displays long-range correlations and has recently been reported to be a one-dimensional disordered conductor.
We investigate how various linear and nonlinear transformations affect the scaling properties of a signal, using the detrended fluctuation analysis (DFA). Specifically, we study the effect of three types of transforms: linear, nonlinear polynomial and logarithmic filters. We compare the scaling properties of signals before and after the transform. We find that linear filters do not change the correlation properties, while the effect of nonlinear polynomial and logarithmic filters strongly depends on (a) the strength of correlations in the original signal, (b) the power of the polynomial filter and (c) the offset in the logarithmic filter. We further investigate the correlation properties of three analytic functions: exponential, logarithmic, and power-law. While these three functions have in general different correlation properties, we find that there is a broad range of variable values, common for all three functions, where they exhibit identical scaling behavior. We further note that the scaling behavior of a class of other functions can be reduced to these three typical cases. We systematically test the performance of the DFA method in accurately estimating long-range power-law correlations in the output signals for different parameter values in the three types of filters, and the three analytic functions we consider.Comment: 12 pages, 7 figure
We show that words in a text present long-range frequency fluctuations due to a strong self-attraction, that is directly related to the relevance of the term to the text considered. The standard deviation of the distance between successive occurrences of a word is an excellent parameter to quantify this self-attraction, and provides us with an effective tool for automatic keyword extraction. DNA sequences also present the same features: “words”, for example codons in the coding part of the sequences, attract between themselves.
We present a new computational approach to finding borders between coding and noncoding DNA. This approach has two features: (i) DNA sequences are described by a 12-letter alphabet that captures the differential base composition at each codon position, and (ii) the search for the borders is carried out by means of an entropic segmentation method which uses only the general statistical properties of coding DNA. We find that this method is highly accurate in finding borders between coding and noncoding regions and requires no "prior training" on known data sets. Our results appear to be more accurate than those obtained with moving windows in the discrimination of coding from noncoding DNA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.