In 1953, Shannon proposed the question of quantification of structural information to analyze communication systems. The question has become one of the longest great challenges in information science and computer science. Here, we propose the first metric for structural information. Given a graph G, we define the K -dimensional structural information of G (or structure entropy of G), denoted by H K (G), to be the minimum overall number of bits required to determine the K -dimensional code of the node that is accessible from random walk in G. The K -dimensional structural information provides the principle for completely detecting the natural or true structure, which consists of the rules, regulations, and orders of the graphs, for fully distinguishing the order from disorder in structured noisy data, and for analyzing communication systems, solving the Shannon's problem and opening up new directions. The K -dimensional structural information is also the first metric of dynamical complexity of networks, measuring the complexity of interactions, communications, operations, and even evolution of networks. The metric satisfies a number of fundamental properties, including additivity, locality, robustness, local and incremental computability, and so on. We establish the fundamental theorems of the one-and two-dimensional structural information of networks, including both lower and upper bounds of the metrics of classic data structures, general graphs, the networks of models, and the networks of natural evolution. We propose algorithms to approximate the K -dimensional structural information of graphs by finding the K -dimensional structure of the graphs that minimizes the K -dimensional structure entropy. We find that the K -dimensional structure entropy minimization is the principle for detecting the natural or true structures in real-world networks. Consequently, our structural information provides the foundation for knowledge discovering from noisy data. We establish a black hole principle by using the two-dimensional structure information of graphs. We propose the natural rank of locally listing algorithms by the structure entropy minimization principle, providing the basis for a next-generation search engine.
Submegabase-size topologically associating domains (TAD) have been observed in high-throughput chromatin interaction data (Hi-C). However, accurate detection of TADs depends on ultra-deep sequencing and sophisticated normalization procedures. Here we propose a fast and normalization-free method to decode the domains of chromosomes (deDoc) that utilizes structural information theory. By treating Hi-C contact matrix as a representation of a graph, deDoc partitions the graph into segments with minimal structural entropy. We show that structural entropy can also be used to determine the proper bin size of the Hi-C data. By applying deDoc to pooled Hi-C data from 10 single cells, we detect megabase-size TAD-like domains. This result implies that the modular structure of the genome spatial organization may be fundamental to even a small cohort of single cells. Our algorithms may facilitate systematic investigations of chromosomal domains on a larger scale than hitherto have been possible.
In this study, we propose a method for constructing cell sample networks from gene expression profiles, and a structural entropy minimisation principle for detecting natural structure of networks and for identifying cancer cell subtypes. Our method establishes a three-dimensional gene map of cancer cell types and subtypes. The identified subtypes are defined by a unique gene expression pattern, and a three-dimensional gene map is established by defining the unique gene expression pattern for each identified subtype for cancers, including acute leukaemia, lymphoma, multi-tissue, lung cancer and healthy tissue. Our three-dimensional gene map demonstrates that a true tumour type may be divided into subtypes, each defined by a unique gene expression pattern. Clinical data analyses demonstrate that most cell samples of an identified subtype share similar survival times, survival indicators and International Prognostic Index (IPI) scores and indicate that distinct subtypes identified by our algorithms exhibit different overall survival times, survival ratios and IPI scores. Our three-dimensional gene map establishes a high-definition, one-to-one map between the biologically and medically meaningful tumour subtypes and the gene expression patterns, and identifies remarkable cells that form singleton submodules.One of the challenges of cancer treatment is targeting specific therapies to pathogenetically distinct tumour types to maximise treatment efficacy and minimise toxicity. Traditionally, cancer classification has been based on the morphological appearance of the tumour; however, this approach has serious limitations. Tumours with similar histopathological appearances can have different clinical courses and exhibit different responses to therapy. Molecular heterogeneity within individual cancer diagnostic categories is also evident in the variable presence of chromosomal translocations, tumour suppressor genes deletions and numerical chromosomal abnormalities. Cancer classification is difficult because the classification relies on specific biological insights, instead of on systematic, comprehensive, global and unbiased methods for identifying tumour subtypes.Over the past decade, the increased availability of large-scale gene expression profiles have led researchers to propose a number of new approaches for classifying tumour types or subtypes based on gene expression analyses. Golub et al. 1 have proposed a neighbour analysis to distinct known types, and a "class predictor" that assigns a new sample to a known class purely based on the gene expression profiles, and have verified their methods using an acute leukaemia dataset. Alizadeh et al. 2 have proposed a method based on hierarchical clustering, which divides the type of diffuse large B-cell lymphomas into two subtypes. Ramaswamy et al. 3 have proposed a "classifier" based on a support vector machine (SVM) and have analysed the accuracy of true type predictions for both the snap-frozen human tumour and normal tissue specimens. Yeoh et al. 4 have analysed sets ...
Topologically associating domains (TAD) are a key structure of the 3D mammalian genomes. However, the prevalence and dynamics of TAD-like domains in single cells remain elusive. Here we develop a new algorithm, named deTOKI, to decode TAD-like domains with single-cell Hi-C data. By non-negative matrix factorization, deTOKI seeks regions that insulate the genome into blocks with minimal chance of clustering. deTOKI outperforms competing tools and reliably identifies TAD-like domains in single cells. Finally, we find that TAD-like domains are not only prevalent, but also subject to tight regulation in single cells.
It was known that cooperation of evolutionary prisoner's dilemma games fails to emerge in homogenous networks such as random graphs. Here we proposed a quantum prisoner's dilemma game. The game consists of two players, in which each player has three choices of strategy: cooperator (C), defector (D) and super cooperator (denoted by Q). We found that quantum entanglement guarantees emergence of a new cooperation, the super cooperation of the quantum prisoner's dilemma games, and that entanglement is the mechanism of guaranteed emergence of cooperation of evolutionary prisoner's dilemma games on networks. We showed that for a game with temptation b, there exists a threshold for a measurement of entanglement, beyond which, (super) cooperation of evolutionary quantum prisoner's dilemma games is guaranteed to quickly emerge, giving rise to stochastic convergence of the cooperations, that if the entanglement degree γ is less than the threshold , then the equilibrium frequency of cooperations of the games is positively correlated to the entanglement degree γ, and that if γ is less than and b is beyond some boundary, then the equilibrium frequency of cooperations of the games on random graphs decreases as the average degree of the graphs increases.
The purpose of this short paper is to clarify and correct the proof of the main result contained in [1]: namely, that there exists no low maximal d. c. e. degree. There we gave a simple proof obtained as an immediate corollary of the following posited extension of the Robinson Splitting Theorem ([1, Theorem 1.7]: For any c. e. set A,Denis Hirschfeldt (private communication) was the first to notice a problem with the particular application of the Recursion Theorem in the proof of this result, one which does not occur in the original Robinson proof [5].We present below the following reformulation of the use of the Recursion Theorem sufficient to correct the proof of our main result (the non-existence of a low maximal d. c. e. 1) ), via a degree-theoretic extension of the Robinson Splitting Theorem:Theorem For any d. c. e. degree l, any c. e. degree a, if l is low and l < a, then there are d. c. e. degrees a 0 , a 1 such that l < a 0 , a 1 < a and a 0 ∪ a 1 = a. P r o o f. Let L be a d. c. e. set of low degree. Let L = L 0 − L 1 for some c. e. sets L 0 , L 1 such that L 0 ⊃ L 1 . Let f be a 1 − 1 computable function such that L 0 = {f (x) : x ∈ ω}, and M = f −1 (L 1 ). Then M is c. e. and M ≤ T L. (M is called Lachlan's set for L.)Given a c. e. set A, and a d. c. e. set L, assume that L < T A and L has low degree. First we construct ω-c. e. sets A 0 , A 1 to satisfy the following requirements:where i = 0, 1, e ∈ ω, and {Φ e : e ∈ ω} is a standard list of all partial computable (p. c.) functionals Φ. Let a = deg T (A), l = deg T (L) and a i = deg T (A i ⊕ L) for i = 0, 1. Then a i is an ω-c. e. degree, for each i = 0, 1. By the R-requirement, a 0 ∪ a 1 = a, and by the S-requirements, l ≤ a i < a, i ∈ {0, 1}. Then by the S-and R-requirements we will have l < a i .We will construct ω-c. e. sets so that they are c. e. in Lachlan's set M for L. The theorem follows from the following result from [2].Proposition 1 (Arslanov, LaForte and Slaman [2]) . Let A and C be sets such that C is c. e., A is c. e. in C, C ≤ T A, and A is ω-c. e. Then deg(A) is d.
We investigate the algorithmic problems of the homophyly phenomenon in networks. Given an undirected graph G = (V, E) and a vertex coloring c : V → {1, 2, · · · , k} of G, we say that a vertex v ∈ V is happy if v shares the same color with all its neighbors, and unhappy, otherwise, and that an edge e ∈ E is happy, if its two endpoints have the same color, and unhappy, otherwise. Supposing c is a partial vertex coloring of G, we define the Maximum Happy Vertices problem (MHV, for short) as to color all the remaining vertices such that the number of happy vertices is maximized, and the Maximum Happy Edges problem (MHE, for short) as to color all the remaining vertices such that the number of happy edges is maximized.Let k be the number of colors allowed in the problems. We show that both MHV and MHE can be solved in polynomial time if k = 2, and that both MHV and MHE are NP-hard if k ≥ 3. We devise a max{1/k, Ω(∆ −3 )}-approximation algorithm for the MHV problem, where ∆ is the maximum degree of vertices in the input graph, and a 1/2-approximation algorithm for the MHE problem. This is the first theoretical progress of these two natural and fundamental new problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.