We consider the problem of sampling from the Potts model on random regular graphs. It is conjectured that sampling is possible when the temperature of the model is in the so-called uniqueness regime of the regular tree, but positive algorithmic results have been for the most part elusive. In this paper, for all integers q ≥ 3 and ∆ ≥ 3, we develop algorithms that produce samples within error o(1) from the q-state Potts model on random ∆-regular graphs, whenever the temperature is in uniqueness, for both the ferromagnetic and antiferromagnetic cases.The algorithm for the antiferromagnetic Potts model is based on iteratively adding the edges of the graph and resampling a bichromatic class that contains the endpoints of the newly added edge. Key to the algorithm is how to perform the resampling step efficiently since bichromatic classes can potentially induce linear-sized components. To this end, we exploit the tree uniqueness to show that the average growth of bichromatic components is typically small, which allows us to use correlation decay algorithms for the resampling step. While the precise uniqueness threshold on the tree is not known for general values of q and ∆ in the antiferromagnetic case, our algorithm works throughout uniqueness regardless of its value.In the case of the ferromagnetic Potts model, we are able to simplify the algorithm significantly by utilising the random-cluster representation of the model. In particular, we demonstrate
We consider spin systems with nearest-neighbor interactions on an n-vertex d-dimensional cube of the integer lattice graph Z d . We study the effects that exponential decay with distance of spin correlations, specifically the strong spatial mixing condition (SSM), has on the rate of convergence to equilibrium of non-local Markov chains. We prove that SSM implies O(log n) mixing of a block dynamics whose steps can be implemented efficiently. We then develop a methodology, consisting of several new comparison inequalities concerning various block dynamics, that allow us to extend this result to other non-local dynamics. As a first application of our method we prove that, if SSM holds, then the relaxation time (i.e., the inverse spectral gap) of general block dynamics is O(r), where r is the number of blocks. A second application of our technology concerns the Swendsen-Wang dynamics for the ferromagnetic Ising and Potts models. We show that SSM implies an O(1) bound for the relaxation time. As a by-product of this implication we observe that the relaxation time of the Swendsen-Wang dynamics in square boxes of Z 2 is O(1) throughout the subcritical regime of the q-state Potts model, for all q ≥ 2. We also prove that for monotone spin systems SSM implies that the mixing time of systematic scan dynamics is O(log n(log log n) 2 ). Systematic scan dynamics are widely employed in practice but have proved hard to analyze. Our proofs use a variety of techniques for the analysis of Markov chains including coupling, functional analysis and linear algebra.
K-mer-based methods are widely used in bioinformatics, but there are many gaps in our understanding of their statistical properties. Here, we consider the simple model where a sequence S (e.g. a genome or a read) undergoes a simple mutation process whereby each nucleotide is mutated independently with some probability r, under the assumption that there are no spurious k-mer matches. How does this process affect the k-mers of S? We derive the expectation and variance of the number of mutated k-mers and of the number of islands (a maximal interval of mutated k-mers) and oceans (a maximal interval of non-mutated k-mers). We then derive hypothesis tests and confidence intervals for r given an observed number of mutated k-mers, or, alternatively, given the Jaccard similarity (with or without minhash). We demonstrate the usefulness of our results using a few select applications: obtaining a confidence interval to supplement the Mash distance point estimate, filtering out reads during alignment by Minimap2, and rating long read alignments to a de Bruijn graph by Jabba.
A universal cycle (u-cycle) is a compact listing of a collection of combinatorial objects. In this paper, we use natural encodings of these objects to show the existence of u-cycles for collections of subsets, matroids, restricted multisets, chains of subsets, multichains, and lattice paths. For subsets, we show that a u-cycle exists for the k-subsets of an n-set if we let k vary in a non zero length interval. We use this result to construct a "covering" of length (1 + o(1)) n k for all subsets of [n] of size exactly k with a specific formula for the o(1) term. We also show that u-cycles exist for all n-length words over some alphabet Σ, which contain all characters from R ⊂ Σ. Using this result we provide u-cycles for encodings of Sperner families of size 2 and proper chains of subsets.
Motivation Sketching is now widely used in bioinformatics to reduce data size and increase data processing speed. Sketching approaches entice with improved scalability but also carry the danger of decreased accuracy and added bias. In this article, we investigate the minimizer sketch and its use to estimate the Jaccard similarity between two sequences. Results We show that the minimizer Jaccard estimator is biased and inconsistent, which means that the expected difference (i.e. the bias) between the estimator and the true value is not zero, even in the limit as the lengths of the sequences grow. We derive an analytical formula for the bias as a function of how the shared k-mers are laid out along the sequences. We show both theoretically and empirically that there are families of sequences where the bias can be substantial (e.g. the true Jaccard can be more than double the estimate). Finally, we demonstrate that this bias affects the accuracy of the widely used mashmap read mapping tool. Availability and implementation Scripts to reproduce our experiments are available at https://github.com/medvedevgroup/minimizer-jaccard-estimator/tree/main/reproduce. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.