We present a new approach to inferring global geometric state of chromatin from Hi-C data. Chromatin conformation capture techniques (3C, and its variants: 4C, 5C, Hi-C, etc.) probe the spatial structure of the genome by identifying physical contacts between 30 genomic loci within the nuclear space. In whole-genome conformation capture (Hi-C) experiments, the signal can be interpreted as spatial proximity between genomic loci and physical distances can be estimated from the data. However, the results of these estimations suffer from internal geometric inconsistencies, notoriously violating the triangle inequality. Here we propose that the inconsistencies may be caused not by 35 experimental artifacts but rather by a mixture of cells, each in one of several conformational states, contained in the sample. We have developed and implemented a graph-theoretic approach that identifies the properties of these subpopulations. We show that the geometrical conflicts in a standard yeast HiC dataset, can be explained by only a small number of homogeneous populations of cells (4 populations are sufficient 40 to reconcile 95,000 most prominent impossible triangles, 8 populations can explain 375,000 top geometric conflicts). Finally, we analyze the functional annotations of genes differentially interacting between the populations, suggesting that each inferred subpopulation may be involved in a functionally different transcriptional program.
Author Summary
45The global conformation of chromatin within a nucleus plays an important role in regulation of genes. The Hi-C technique can detect proximity between genomic loci, but attempts to use Hi-C data to infer the global conformation lead to hundreds of thousands of impossible geometries, violating the triangle inequality. To date, there was no explanation for this phenomenon. Here, we resolve these discrepancies by modeling 50 the sample as a mixture of nuclei in several conformation states. We have developed a graph-theoretic approach to characterize the shape of chromatin in each of these states. In a real-life situation, as few as 4-5 discrete subpopulations can resolve all spatial inconsistencies in the data. Moreover, the results suggest that each subpopulation is associated with a functionally specific transcriptional program. 55