The faithful execution of biological processes requires a precise and carefully orchestrated set of steps that depend on the proper spatial and temporal expression of genes. Here we review the various classes of transcriptional regulatory elements (core promoters, proximal promoters, distal enhancers, silencers, insulators/boundary elements, and locus control regions) and the molecular machinery (general transcription factors, activators, and coactivators) that interacts with the regulatory elements to mediate precisely controlled patterns of gene expression. The biological importance of transcriptional regulation is highlighted by examples of how alterations in these transcriptional components can lead to disease. Finally, we discuss the methods currently used to identify transcriptional regulatory elements, and the ability of these methods to be scaled up for the purpose of annotating the entire human genome.
Chorionic gonadotropin (CG) is a critical signal in establishing pregnancy in humans and some other primates, but this placentally expressed hormone has not been found in other mammalian orders. The gene for one of its two subunits (CG beta subunit [CGbeta]) arose by duplication from the luteinizing hormone beta subunit gene (LHbeta), present in all mammals tested. In this study, 14 primate and related mammalian species were examined by Southern blotting and DNA sequencing to determine where in mammalian phylogeny the CGbeta gene originated. Bats (order Chiroptera), flying lemur (order Dermoptera), strepsirrhine primates, and tarsiers do not have a CGbeta gene, although they possess one copy of the LHbeta gene. The CGbeta gene first arose in the common ancestor of the anthropoid primates (New World monkeys, Old World monkeys, apes, and humans), after the anthropoids diverged from tarsiers. At least two subsequent duplication events occurred in the catarrhine primates, all of which possess multiple CGbeta copies. The LHbeta-CGbeta family of genes has undergone frequent gene conversion among the catarrhines, as well as periods of strong positive selection in the New World monkeys (platyrrhines). In addition, newly generated DNA sequences from the promoter of the CG alpha subunit gene indicate that platyrrhine monkeys use a different mechanism of alpha gene expression control than that found in catarrhines.
TATA-box-binding protein (TBP) is a highly conserved RNA polymerase II general transcription factor that binds to the core promoter and initiates assembly of the preinitiation complex. Two proteins with high homology to TBP have been found: TBP-related factor 1 (TRF1), described only in Drosophila melanogaster, and TRF2, which is broadly distributed in metazoans. Here, we report the identification and characterization of an additional TBP-related factor, TRF3. TRF3 is virtually identical to TBP in the C-terminal core domain, including all residues involved in DNA binding and interaction with other general transcription factors. Like other TBP family members, the N-terminal region of TRF3 is divergent. The TRF3 gene is present and expressed in vertebrates, from fish through humans, but absent from the genomes of the urochordate Ciona intestinalis and the lower eukaryotes D. melanogaster and Caenorhabditis elegans. TRF3 is a nuclear protein that is present in all human and mouse tissues and cell lines examined. Despite the highly homologous TBP-like C-terminal core domain, gel filtration analysis indicates that the native molecular weight of TRF3 is substantially less than that of TFIID. Interestingly, after mitosis, reimport of TRF3 into the nucleus occurs subsequent to TBP and other basal transcription factors. In summary, TRF3 is a highly conserved vertebrate-specific TRF whose phylogenetic conservation, expression pattern, and other properties are distinct from those of TBP and all other TRFs.
There has been a recent surge in the use of genome-wide methodologies to identify and annotate the transcriptional regulatory elements in the human genome. Here we review some of these methodologies and the conceptual insights about transcription regulation that have been gained from the use of genome-wide studies. It has become clear that the binding of transcription factors is itself a highly regulated process, and binding does not always appear to have functional consequences. Numerous properties have now been associated with regulatory elements that may be useful in their identification. Several aspects of enhancer function have been shown to be more widespread than was previously appreciated, including the highly combinatorial nature of transcription factor binding, the postinitiation regulation of many target genes, and the binding of enhancers at early stages to maintain their competence during development. Going forward, the integration of multiple genome-wide data sets should become a standard approach to elucidate higher-order regulatory interactions.
We developed a rules‐based scoring system to classify DNA variants into five categories including pathogenic, likely pathogenic, variant of uncertain significance (VUS), likely benign, and benign. Over 16,500 pathogenicity assessments on 11,894 variants from 338 genes were analyzed for pathogenicity based on prediction tools, population frequency, co‐occurrence, segregation, and functional studies collected from internal and external sources. Scores were calculated by trained scientists using a quantitative framework that assigned differential weighting to these five types of data. We performed descriptive and comparative statistics on the dataset and tested interobserver concordance among the trained scientists. Private variants defined as variants found within single families (n = 5,182), were either VUS (80.5%; n = 4,169) or likely pathogenic (19.5%; n = 1,013). The remaining variants (n = 6,712) were VUS (38.4%; n = 2,577) or likely benign/benign (34.7%; n = 2,327) or likely pathogenic/pathogenic (26.9%, n = 1,808). Exact agreement between the trained scientists on the final variant score was 98.5% [95% confidence interval (CI) (98.0, 98.9)] with an interobserver consistency of 97% [95% CI (91.5, 99.4)]. Variant scores were stable and showed increasing odds of being in agreement with new data when re‐evaluated periodically. This carefully curated, standardized variant pathogenicity scoring system provides reliable pathogenicity scores for DNA variants encountered in a clinical laboratory setting.
The general transcription factor TFIID comprises the TATA-box-binding protein (TBP) and approximately 14 TBP-associated factors (TAFs). Here we find, unexpectedly, that undifferentiated human embryonic stem cells (hESCs) contain only six TAFs (TAFs 2, 3, 5, 6, 7 and 11), whereas following differentiation all TAFs are expressed. Directed and global chromatin immunoprecipitation analyses reveal an unprecedented promoter occupancy pattern: most active genes are bound by only TAFs 3 and 5 along with TBP, whereas the remaining active genes are bound by TBP and all six hESC TAFs. Consistent with these results, hESCs contain a previously undescribed complex comprising TAFs 2, 6, 7, 11 and TBP. Altering the composition of hESC TAFs, either by depleting TAFs that are present or ectopically expressing TAFs that are absent, results in misregulated expression of pluripotency genes and induction of differentiation. Thus, the selective expression and use of TAFs underlies the ability of hESCs to self-renew.DOI: http://dx.doi.org/10.7554/eLife.00068.001
The general transcription factor TFIID comprises the TATA-box-binding protein (TBP) and approximately 14 TBP-associated factors (TAFs). Here we find, unexpectedly, that undifferentiated human embryonic stem cells (hESCs) contain only six TAFs (TAFs 2, 3, 5, 6, 7 and 11), whereas following differentiation all TAFs are expressed. Directed and global chromatin immunoprecipitation analyses reveal an unprecedented promoter occupancy pattern: most active genes are bound by only TAFs 3 and 5 along with TBP, whereas the remaining active genes are bound by TBP and all six hESC TAFs. Consistent with these results, hESCs contain a previously undescribed complex comprising TAFs 2, 6, 7, 11 and TBP. Altering the composition of hESC TAFs, either by depleting TAFs that are present or ectopically expressing TAFs that are absent, results in misregulated expression of pluripotency genes and induction of differentiation. Thus, the selective expression and use of TAFs underlies the ability of hESCs to self-renew.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.