The single-molecule multiplex chromatin interaction data generated by emerging non-ligationbased 3D genome mapping technologies provide novel insights into high dimensional chromatin organization, yet introduce new computational challenges. We developed MIA-Sig (https://github.com/TheJacksonLaboratory/mia-sig.git), an algorithmic framework to de-noise the data, assess the statistical significance of chromatin complexes, and identify topological domains and inter-domain contacts. On chromatin immunoprecipitation (ChIP)-enriched data, MIA-Sig can clearly distinguish the protein-associated interactions from the non-specific topological domains.
Main textPrevious 3D genome-mapping efforts have suggested complex chromosomal folding structures. In particular, methods based on high-throughput sequencing capture bulk chromatin contacts (Hi-C (Lieberman-Aiden et al., 2009)) or enrich for chromatin contacts involving a specific protein (ChIA-PET (Fullwood et al., 2009)). Both of these methods rely on proximity ligation, and therefore can only reveal population averages of pairwise contacts. Thus, they lacked the ability to simultaneously capture multiple loci involved in a chromatin complex in an individual cell.To overcome these challenges, novel experimental methods have recently been developed to capture multiplex chromatin contacts with single-molecule resolution. For instance, GAM (Beagrie et al., 2017) identifies multi-way interactions by capturing multiple DNA elements co-existing in a given nuclear slice, SPRITE (Quinodoz et al., 2018) barcodes individual chromatin complexes via a split-pool strategy, and ChIA-Drop (Zheng et al., 2019) partitions each complex into a microfluidic droplets for barcoding and amplification. Collectively, these emerging 3D genome-mapping technologies are advancing the frontier of the nuclear architecture field. However, as with other genomic approaches prone to the background noise, the noisy and high-dimensional nature of the multiplex data poses unique computational challenges that cannot be readily addressed with existing tools that are tailored for pairwise interactions data. Thus, we developed MIA-Sig (Multiplex Interactions Analysis by Signal processing algorithms) with a set of Python modules tailored for ChIA-Drop and related datatypes.A central analytic challenge is to distinguish the true biological chromatin complexes from the experimental noise. A possible source of noise is an event that two or more chromatin complexes are potentially encapsulated in the same microfluidics droplet and then assigned the same barcode, yielding a multiplet (Figure 1a). The problem also prevails in microfluidics-based single-cell RNA-seq data, which is then resolved computationally via dimensionality-reduction and clustering (Wolock et al., 2019). However, methods developed for single-cell