Eukaryotic gene transcription is accompanied by acetylation and methylation of nucleosomes near promoters, but the locations and roles of histone modifications elsewhere in the genome remain unclear. We determined the chromatin modification states in high resolution along 30 Mb of the human genome and found that active promoters are marked by trimethylation of Lys4 of histone H3 (H3K4), whereas enhancers are marked by monomethylation, but not trimethylation, of H3K4. We developed computational algorithms using these distinct chromatin signatures to identify new regulatory elements, predicting over 200 promoters and 400 enhancers within the 30-Mb region. This approach accurately predicted the location and function of independently identified regulatory elements with high sensitivity and specificity and uncovered a novel functional enhancer for the carnitine transporter SLC22A5 (OCTN2). Our results give insight into the connections between chromatin modifications and transcriptional regulatory activity and provide a new tool for the functional annotation of the human genome.
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
In eukaryotic cells, transcription of every protein-coding gene begins with the assembly of an RNA Polymerase II preinitiation complex (PIC) on the promoter 1 . The promoters, in conjunction with enhancers, silencers and insulators, define the combinatorial codes that specify gene expression patterns 2 . Our ability to analyze the control logic encoded in the human genome is currently limited by a lack of accurate information of the promoters for most genes 3 . Here, we describe a genomewide map of active promoters in human fibroblast cells, determined by experimentally locating the sites of PIC binding throughout the human genome. This map defines 10,571 active promoters corresponding to 6,763 known genes and at least 1,199 un-annotated transcriptional units. Features of the map suggest extensive usage of multiple promoters by the human genes and widespread clustering of active promoters in the genome. In addition, examination of the genome-wide expression profile reveals four general classes of promoters that define the transcriptome of the cell. These results provide a global view of the functional relationship among the transcriptional machinery, chromatin structure and gene expression in human cells.The PIC consists of the RNA Polymerase II (RNAP), the transcription factor IID (TFIID) and other general transcription factors 4 . Our strategy to map the PIC binding sites involves a chromatin immunoprecipitation coupled DNA microarray analysis (ChIP-on-chip), which combines the immunoprecipitation of PIC-bound chromatin from formaldehyde crosslinked cells with parallel identification of the resulting bound DNA sequences using DNA microarrays 5,6 . Previously, we have demonstrated the feasibility of this strategy by successfully mapping active promoters in 1% of the human genome that correspond to the 44 genomic loci known as the ENCODE regions 6,7 . To apply this strategy to the entire human genome, we fabricated a series of DNA microarrays 8 containing roughly 14.5 million 50-mer oligonucleotides, designed to represent all the non-repeat DNA throughout the human genome at 100 basepairs (bp) resolution. We immunoprecipitated TFIID-bound DNA from the primary fibroblast IMR90 cells with a monoclonal antibody that specifically recognizes the TAF1 subunit of this complex (TBP associated factor 1, formerly TAF II 250 9, Fig 1a). We then amplified and fluorescently labeled the resulting DNA, and hybridized it to the above microarrays along with a differentially labeled control DNA (Fig. 1a). We determined 9,966 potential TFIID-binding regions using a simple algorithm requiring a stretch of four neighboring probes to have a hybridization signal significantly above the background. To 6 To whom correspondence should be addressed. Email: biren@ucsd.edu. Phone: 858 822 5766; Fax: 858 534 7750. 5 These two authors contributed equally to this work. Author to which correspondence and material request should be addressed: Bing Ren, biren@ucsddu. The microarray datasets are available from GEO (accession numbers to be ...
By integrating genome-wide maps of RNA polymerase II (Polr2a) binding with gene expression data and H3ac and H3K4me3 profiles, we characterized promoters with enriched activity in mouse embryonic stem cells (mES) as well as adult brain, heart, kidney, and liver. We identified ∼24,000 promoters across these samples, including 16,976 annotated mRNA 5′ ends and 5153 additional sites validating cap-analysis of gene expression (CAGE) 5′ end data. We showed that promoters with CpG islands are typically non-tissue specific, with the majority associated with Polr2a and the active chromatin modifications in nearly all the tissues examined. By contrast, the promoters without CpG islands are generally associated with Polr2a and the active chromatin marks in a tissue-dependent way. We defined 4396 tissue-specific promoters by adapting a quantitative index of tissue-specificity based on Polr2a occupancy. While there is a general correspondence between Polr2a occupancy and active chromatin modifications at the tissue-specific promoters, a subset of them appear to be persistently marked by active chromatin modifications in the absence of detectable Polr2a binding, highlighting the complexity of the functional relationship between chromatin modification and gene expression. Our results provide a resource for exploring promoter Polr2a binding and epigenetic states across pluripotent and differentiated cell types in mammals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.