We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.
BackgroundThe regulation of specific target genes by transcription factors is central to our understanding of gene network control in developmental and physiological processes yet how target specificity is achieved is still poorly understood. This is well illustrated by the Hox family of transcription factors as their limited in vitro DNA-binding specificity contrasts with their clear in vivo functional specificity.ResultsWe generated genome-wide binding profiles for three Hox proteins, Ubx, Abd-A and Abd-B, following transient expression in Drosophila Kc167 cells, revealing clear target specificity and a striking influence of chromatin accessibility. In the absence of the TALE class homeodomain cofactors Exd and Hth, Ubx and Abd-A bind at a very similar set of target sites in accessible chromatin, whereas Abd-B binds at an additional specific set of targets. Provision of Hox cofactors Exd and Hth considerably modifies the Ubx genome-wide binding profile enabling Ubx to bind at an additional novel set of targets. Both the Abd-B specific targets and the cofactor-dependent Ubx targets are in chromatin that is relatively DNase1 inaccessible prior to the expression of Hox proteins/Hox cofactors.ConclusionsOur experiments demonstrate a strong role for chromatin accessibility in Hox protein binding and suggest that Hox protein competition with nucleosomes has a major role in Hox protein target specificity in vivo.Electronic supplementary materialThe online version of this article (doi:10.1186/s13072-015-0049-x) contains supplementary material, which is available to authorized users.
Hox genes encode a family of transcription factors that are key developmental regulators with a highly conserved role in specifying segmental diversity along the metazoan body axis. Although they have been shown to regulate a wide variety of downstream processes, direct transcriptional targets have been difficult to identify and this has been a major obstacle to our understanding of Hox gene function. We report the identification of genome-wide binding sites for the Hox protein Ultrabithorax (Ubx) using a YFP-tagged Drosophila protein-trap line together with chromatin immunoprecipitation and microarray analysis. We identify 1,147 genes bound by Ubx at high confidence in chromatin from the haltere imaginal disc, a prominent site of Ubx function where it specifies haltere versus wing development. The functional relevance of these genes is supported by their overlap with genes differentially expressed between wing and haltere imaginal discs. The Ubx-bound gene set is highly enriched in genes involved in developmental processes and contains both high-level regulators as well as genes involved in more basic cellular functions. Several signalling pathways are highly enriched in the Ubx target gene set and our analysis supports the view that Hox genes regulate many levels of developmental pathways and have targets distributed throughout the gene network. We also performed genome-wide analysis of the binding sites for the Hox cofactor Homothorax (Hth), revealing a striking similarity with the Ubx binding profile. We suggest that these binding profiles may be strongly influenced by chromatin accessibility and provide evidence of a link between Ubx/Hth binding and chromatin state at genes regulated by Polycomb silencing. Overall, we define a set of direct Ubx targets in the haltere imaginal disc and suggest that chromatin accessibility has important implications for Hox target selection and for transcription factor binding in general.
Identification of unconventional functional features such as fusion transcripts is a challenging task in the effort to annotate all functional DNA elements in the human genome. Paired-End diTag (PET) analysis possesses a unique capability to accurately and efficiently characterize the two ends of DNA fragments, which may have either normal or unusual compositions. This unique nature of PET analysis makes it an ideal tool for uncovering unconventional features residing in the human genome. Using the PET approach for comprehensive transcriptome analysis, we were able to identify fusion transcripts derived from genome rearrangements and actively expressed retrotransposed pseudogenes, which would be difficult to capture by other means. Here, we demonstrate this unique capability through the analysis of 865,000 individual transcripts in two types of cancer cells. In addition to the characterization of a large number of differentially expressed alternative 5′ and 3′ transcript variants and novel transcriptional units, we identified 70 fusion transcript candidates in this study. One was validated as the product of a fusion gene between BCAS4 and BCAS3 resulting from an amplification followed by a translocation event between the two loci, chr20q13 and chr17q23. Through an examination of PETs that mapped to multiple genomic locations, we identified 4055 retrotransposed loci in the human genome, of which at least three were found to be transcriptionally active. The PET mapping strategy presented here promises to be a useful tool in annotating the human genome, especially aberrations in human cancer genomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.