While we would like to predict exact values, the information available, being incomplete, is rarely sufficient - usually allowing only conditional probability distributions to be predicted. This article discusses hierarchical correlation reconstruction (HCR) methodology for such a prediction using the example of bid-ask spreads (usually unavailable), but here predicted from more accessible data like closing price, volume, high/low price and returns. Using HCR methodology, as in copula theory, we first normalized marginal distributions so that they were nearly uniform. Then we modelled joint densities as linear combinations of orthonormal polynomials, obtaining their decomposition into mixed moments. Then we modelled each moment of the predicted variable separately as a linear combination of mixed moments of known variables using least squares linear regression. By combining these predicted moments, we obtained the predicted density as a polynomial, for which we can e.g. calculate the expected value, but also the variance to determine the uncertainty of the prediction, or we can use the entire distribution for, e.g. more accurate further calculations or generating random values. 10-fold cross-validation log-likelihood tests were conducted for 22 DAX companies, leading to very accurate predictions, especially when individual models were used for each company, as significant differences were found between their behaviours. An additional advantage of using this methodology is that it is computationally inexpensive; estimating and evaluating a model with hundreds of parameters and thousands of data points by means of this methodology takes only a second on a computer.
Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length – implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.
Variable γ-ray emission from blazars, one of the most powerful classes of astronomical sources featuring relativistic jets, is a widely discussed topic. In this work, we present the results of a variability study of a sample of 20 blazars using γ-ray (0.1–300 GeV) observations from Fermi/LAT telescope. Using maximum likelihood estimation (MLE) methods, we find that the probability density functions that best describe the γ-ray blazar flux distributions use the stable distribution family, which generalizes the Gaussian distribution. The results suggest that the average behaviour of the γ-ray flux variability over this period can be characterized by log-stable distributions. For most of the sample sources, this estimate leads to standard lognormal distribution (α = 2). However, a few sources clearly display heavy tail distributions (MLE leads to α < 2), suggesting underlying multiplicative processes of infinite variance. Furthermore, the light curves were analyzed by employing novel non-stationarity and autocorrelation analyses. The former analysis allowed us to quantitatively evaluate non-stationarity in each source – finding the forgetting rate (corresponding to decay time) maximizing the log-likelihood for the modeled evolution of the probability density functions. Additionally, evaluation of local variability allows us to detect local anomalies, suggesting a transient nature of some of the statistical properties of the light curves. With the autocorrelation analysis, we examined the lag dependence of the statistical behaviour of all the {(yt, yt + l)} points, described by various mixed moments, allowing us to quantitatively evaluate multiple characteristic time scales and implying possible hidden periodic processes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.