Models of sequence evolution typically assume that all sequences are possible. However, restriction enzymes that cut DNA at specific recognition sites provide an example where carrying a recognition site can be lethal. Motivated by this observation, we studied the set of strings over a finite alphabet with taboos, that is, with prohibited substrings. The taboo-set is referred to as $$\mathbb {T}$$ T and any allowed string as a taboo-free string. We consider the so-called Hamming graph $$\varGamma _n(\mathbb {T})$$ Γ n ( T ) , whose vertices are taboo-free strings of length n and whose edges connect two taboo-free strings if their Hamming distance equals one. Any (random) walk on this graph describes the evolution of a DNA sequence that avoids taboos. We describe the construction of the vertex set of $$\varGamma _n(\mathbb {T})$$ Γ n ( T ) . Then we state conditions under which $$\varGamma _n(\mathbb {T})$$ Γ n ( T ) and its suffix subgraphs are connected. Moreover, we provide an algorithm that determines if all these graphs are connected for an arbitrary $$\mathbb {T}$$ T . As an application of the algorithm, we show that about $$87\%$$ 87 % of bacteria listed in REBASE have a taboo-set that induces connected taboo-free Hamming graphs, because they have less than four type II restriction enzymes. On the other hand, four properly chosen taboos are enough to disconnect one suffix subgraph, and consequently connectivity of taboo-free Hamming graphs could change depending on the composition of restriction sites.
We consider the transmission of a state from the root of a tree towards its leaves, assuming that each transmission occurs through a noisy channel. The states at the leaves are observed, while at deeper nodes we can compute the likelihood of each state given the observation. In this sense, information flows from child nodes towards the parent node.Here we find an upper bound of this children-to-parent information flow. To do so, first we introduce a new measure of information, the memory vector, whose norm quantifies whether all states have the same likelihood. Then we find conditions such that the norm of the memory vector at the parent node can be linearly bounded by the sum of norms at the child nodes.We also describe the reconstruction problem of estimating the ancestral state at the root given the observation at the leaves. We infer sufficient conditions under which the original state at the root cannot be confidently reconstructed using the observed leaves, assuming that the number of levels from the root to the leaves is large.
Models of sequence evolution typically assume that all sequences are possible. However, restriction enzymes that cut DNA at specific recognition sites provide an example where carrying a recognition sequence can be lethal. Motivated by this observation, we studied the set of strings over a finite alphabet with taboos, that is, with prohibited substrings. The taboo-set is referred to as T and any allowed string as a taboo-free string. We consider the graph Γ n (T) whose vertices are taboo-free strings of length n and whose edges connect two taboo-free strings if their Hamming distance equals 1. Any (random) walk on this graph describes the evolution of a DNA sequence that avoids deleterious taboos. We describe the construction of the vertex set of Γ n (T). Then we state conditions under which Γ n (T) and its suffix subgraphs are connected. Moreover, we provide a simple algorithm that can determine, for an arbitrary T, if all these graphs are connected. We concluded that bacterial taboo-free Hamming graphs are nearly always connected, although 4 properly chosen taboos are enough to disconnect one of its suffix subgraphs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.