The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.
The glucocorticoid steroid hormone cortisol is released by the adrenal glands in response to stress and serves as a messenger in circadian rhythms. Transcriptional responses to this hormonal signal are mediated by the glucocorticoid receptor (GR). We determined GR binding throughout the human genome by using chromatin immunoprecipitation followed by next-generation DNA sequencing, and measured related changes in gene expression with mRNA sequencing in response to the glucocorticoid dexamethasone (DEX). We identified 4392 genomic positions occupied by the GR and 234 genes with significant changes in expression in response to DEX. This genomic census revealed striking differences between gene activation and repression by the GR. While genes activated with DEX treatment have GR bound within a median distance of 11 kb from the transcriptional start site (TSS), the nearest GR binding for genes repressed with DEX treatment is a median of 146 kb from the TSS, suggesting that DEX-mediated repression occurs independently of promoter-proximal GR binding. In addition to the dramatic differences in proximity of GR binding, we found differences in the kinetics of gene expression response for induced and repressed genes, with repression occurring substantially after induction. We also found that the GR can respond to different levels of corticosteroids in a gene-specific manner. For example, low doses of DEX selectively induced PER1, a transcription factor involved in regulating circadian rhythms.
A complex interplay between transcription factors (TFs) and the genome regulates transcription. However, connecting variation in genome sequence with variation in TF binding and gene expression is challenging due to environmental differences between individuals and cell types. To address this problem, we measured genome-wide differential allelic occupancy of 24 TFs and EP300 in a human lymphoblastoid cell line GM12878. Overall, 5% of human TF binding sites have an allelic imbalance in occupancy. At many sites, TFs clustered in TF-binding hubs on the same homolog in especially open chromatin. While genetic variation in core TF binding motifs generally resulted in large allelic differences in TF occupancy, most allelic differences in occupancy were subtle and associated with disruption of weak or noncanonical motifs. We also measured genome-wide differential allelic expression of genes with and without heterozygous exonic variants in the same cells. We found that genes with differential allelic expression were overall less expressed both in GM12878 cells and in unrelated human cell lines. Comparing TF occupancy with expression, we found strong association between allelic occupancy and expression within 100 bp of transcription start sites (TSSs), and weak association up to 100 kb from TSSs. Sites of differential allelic occupancy were significantly enriched for variants associated with disease, particularly autoimmune disease, suggesting that allelic differences in TF occupancy give functional insights into intergenic variants associated with disease. Our results have the potential to increase the power and interpretability of association studies by targeting functional intergenic variants in addition to protein coding sequences.
Extraordinary single-cell diversity is generated in the vertebrate nervous system by the combinatorial expression of the clustered protocadherin genes (Pcdhα, -β, and -γ). This diversity is generated by a combination of stochastic promoter choice and alternative premRNA splicing. Here we show that both the insulator-binding protein CTCF and the cohesin complex subunit Rad21 bind to two highly conserved DNA sequences, the first within and the second downstream of transcriptionally active Pcdhα promoters. Both CTCF and Rad21 bind to these sites in vitro and in vivo, this binding directly correlates with alternative isoform expression, and knocking down CTCF expression reduces alternative isoform expression. Remarkably, a similarly spaced pair of CTCF/Rad21 binding sites was identified within a distant enhancer element (HS5-1), which is required for normal levels of alternative isoform expression. We also identify an additional, unique regulatory role for cohesin, as Rad21 binds to another enhancer (HS7) independently of CTCF, and knockdown of Rad21 reduces expression of the constitutive, biallelically expressed Pcdhα isoforms αc1 and αc2. We propose that CTCF and the cohesin complex initiate and maintain Pcdhα promoter choice by mediating interactions between Pcdhα promoters and enhancers.
Chromatin immunoprecipitation followed by next-generation DNA sequencing (ChIP-seq) is a widely used technique for identifying transcription factor (TF) binding events throughout an entire genome. However, ChIP-seq is limited by the availability of suitable ChIP-seq grade antibodies, and the vast majority of commercially available antibodies fail to generate usable data sets. To ameliorate these technical obstacles, we present a robust methodological approach for performing ChIPseq through epitope tagging of endogenous TFs. We used clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-based genome editing technology to develop CRISPR epitope tagging ChIP-seq (CETCh-seq) of DNA-binding proteins. We assessed the feasibility of CETCh-seq by tagging several DNA-binding proteins spanning a wide range of endogenous expression levels in the hepatocellular carcinoma cell line HepG2. Our data exhibit strong correlations between both replicate types as well as with standard ChIP-seq approaches that use TF antibodies. Notably, we also observed minimal changes to the cellular transcriptome and to the expression of the tagged TF. To examine the robustness of our technique, we further performed CETCh-seq in the breast adenocarcinoma cell line MCF7 as well as mouse embryonic stem cells and observed similarly high correlations. Collectively, these data highlight the applicability of CETCh-seq to accurately define the genome-wide binding profiles of DNA-binding proteins, allowing for a straightforward methodology to potentially assay the complete repertoire of TFs, including the large fraction for which ChIP-quality antibodies are not available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.