Linkage disequilibrium (LD) and genomic proximity are commonly used to map non-coding variants to genes, despite increasing examples of causal variants outside the LD block of the gene they regulate. We compared chromatin contacts in 22 cell types to LD across billions of pairs of loci in the human genome and found no concordance, even at genomic distances below 25 kilobases where both tend to be high. Gene expression and ontology data suggest that chromatin contacts identify regulatory variants more reliably than do LD and genomic proximity. We conclude that the genomic architectures of genetic and physical interactions are independent, with important implications for gene regulatory evolution and precision medicine.
2. CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/272245 doi: bioRxiv preprint first posted online Feb. 26, 2018; Genetic variants ranging from large scale chromosomal rearrangements to single nucleotide polymorphisms (SNPs) can impact gene function by altering exonic sequence or by changing gene regulation. Recent studies estimate that 93% of disease-associated variants are in non-coding DNA [1] and 60% of causal variants map to regulatory elements [2], accounting for 79% of phenotypic variance [3]. Additionally, diseaseassociated variants are enriched in regulatory regions [4], especially those from tissues relevant to the phenotype [5]. Functionally annotating non-coding variants and correctly mapping causal variants to the genes and pathways they affect is critical for understanding disease mechanisms and using genetics in precision medicine [6][7][8][9].Common practice associates non-coding variants with the closest gene promoter or promoters within the same LD block. However, regulatory variants can affect phenotypes by changing the expression of target genes up to several megabases (mb) away [10][11][12][13], well beyond their LD block (median length ≈ 1-2kb, Supplementary Table 1b). This prompted Corradin and colleagues to conclude that a gene's regulatory program is not related to local haplotype structure [14]. Even when a GWAS SNP is in LD with a gene that has a strong biological link to the phenotype, the causal variant may be in a nearby non-coding region regulating a different gene [15,16]. Highlighting the long range of regulatory interactions, recent work in T cells found that only 14% of 684 autoimmune variants targeted their closest gene; 86% skipped one or more intervening genes to reach their target, and 64% of variants interacted with multiple genes [17]. Thus, many non-coding variants are far away and in low LD with the promoters they regulate.Distal non-coding variants can cause changes in gene regulation and phenotypes via three-dimensional (3D) chromatin interactions. For example, the obesity-associated FTO variant (rs1421085) was found to disrupt an ARID5...