The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.
Chromosome correlation maps display correlations between the expression patterns of genes on the same chromosome. Using these maps, we show here that adjacent pairs of genes, as well as nearby non-adjacent pairs of genes, show correlated expression independent of their orientation. We present specific examples of adjacent pairs with highly correlated expression patterns, in which the promoter of only one of the two genes contains an upstream activating sequence (UAS) known to be associated with that expression pattern. Finally, we show that genes with similar functions tend to occur in adjacent positions along the chromosomes. Our results suggest that, in certain chromosomal expression domains, an UAS can affect the transcription of genes that are not immediately downstream from it.
A network of interacting proteins controls the activity of cyclin-dependent kinase 2 (Cdk2) (refs 1,2) and governs the entry of higher eukaryotic cells into S phase. Analysis of this and other genetic regulatory networks would be facilitated by intracellular reagents that recognize specific targets and inhibit specific network connections. We report here the expression of a combinatorial library of constrained 20-residue peptides displayed by the active-site loop of Escherichia coli thioredoxin, and the use of a two-hybrid system to select those that bind human Cdk2. These peptide aptamers were designed to mimic the recognition function of the complementarity-determining regions of immunoglobulins. The aptamers recognized different epitopes on the Cdk2 surface with equilibrium dissociation constant in the nanomolar range; those tested inhibited Cdk2 activity. Our results show that peptide aptamers bear some analogies with monoclonal antibodies, with the advantages that they are isolated together with their coding genes, that their small size should allow their structures to be solved, and that they are designated to function inside cells.
The histone modification state of genomic regions is hypothesized to reflect the regulatory activity of the underlying genomic DNA. Based on this hypothesis, the ENCODE Project Consortium measured the status of multiple histone modifications across the genome in several cell types and used these data to segment the genome into regions with different predicted regulatory activities. We measured the cis-regulatory activity of more than 2000 of these predictions in the K562 leukemia cell line. We tested genomic segments predicted to be Enhancers, Weak Enhancers, or Repressed elements in K562 cells, along with other sequences predicted to be Enhancers specific to the H1 human embryonic stem cell line (H1-hESC). Both Enhancer and Weak Enhancer sequences in K562 cells were more active than negative controls, although surprisingly, Weak Enhancer segmentations drove expression higher than did Enhancer segmentations. Lower levels of the covalent histone modifications H3K36me3 and H3K27ac, thought to mark active enhancers and transcribed gene bodies, associate with higher expression and partly explain the higher activity of Weak Enhancers over Enhancer predictions. While DNase I hypersensitivity (HS) is a good predictor of active sequences in our assay, transcription factor (TF) binding models need to be included in order to accurately identify highly expressed sequences. Overall, our results show that a significant fraction (~26%) of the ENCODE enhancer predictions have regulatory activity, suggesting that histone modification states can reflect the cis-regulatory activity of sequences in the genome, but that specific sequence preferences, such as TF-binding sites, are the causal determinants of cis-regulatory activity.[Supplemental material is available for this article.]It is widely reported that specific combinations of covalent histone modifications reflect the regulatory function of underlying genomic DNA sequence (Strahl and Allis 2000). As part of the ENCODE Project, the genomic locations of a variety of covalent histone modifications were determined by chromatin immunoprecipitation sequencing (ChIP-seq) in a number of cell types and cell lines. Two studies used these data to train computational models that predict different functional regions of the human genome. These unsupervised learning algorithms, Segway (Hoffman et al. 2012) and ChromHMM Kellis 2010, 2012), take functional genomics data as input (DNase-seq; FAIRE-seq; and ChIP-seq of histone modifications, RNA polymerase II large subunit [POLR2A], and CTCF) and return segmentation classes, which are then assigned a hypothesized function using current knowledge of histone modification function. As part of the ENCODE Project, these two sets of predictions were consolidated to create a unified annotation of the entire human genome with seven functional classes in multiple cell types. These segmentations include Transcription Start Site, Promoter Flanking, Transcribed, CTCF-bound, Enhancer, Weak Enhancer, and Repressed or Inactive segments (The ENCODE Project ...
Cis-regulatory elements (CREs) control gene expression by recruiting transcription factors (TFs) and other DNA binding proteins. We aim to understand how individual nucleotides contribute to the function of CREs. Here we introduce CRE analysis by sequencing (CRE-seq), a high-throughput method for producing and testing large numbers of reporter genes in mammalian cells. We used CRE-seq to assay >1,000 single and double nucleotide mutations in a 52-bp CRE in the Rhodopsin promoter that drives strong and specific expression in mammalian photoreceptors. We find that this particular CRE is remarkably complex. The majority (86%) of single nucleotide substitutions in this sequence exert significant effects on regulatory activity. Although changes in the affinity of known TF binding sites explain some of these expression changes, we present evidence for complex phenomena, including binding site turnover and TF competition. Analysis of double mutants revealed complex, nucleotide-specific interactions between residues in different TF binding sites. We conclude that some mammalian CREs are finely tuned by evolution and function through complex, nonadditive interactions between bound TFs. CRE-seq will be an important tool to uncover the rules that govern these interactions.utations in cis-regulatory elements (CREs) often have unexpected effects on gene regulation. We lack models with the predictive power to accurately interpret the functional consequences of noncoding polymorphisms. More generally, we do not understand the nucleotide-level architecture that distinguishes true CREs from nonfunctional groupings of transcription factor (TF) binding sites (TFBS). Although consortium-driven efforts continue to predict that large numbers of mammalian sequences are CREs (1, 2), we lack a corresponding high-throughput method for functionally analyzing the consequences of variants in these elements. Addressing these problems requires fine structure mutational analysis of mammalian CREs on a large scale-experiments that are difficult to perform using traditional assays. To facilitate such experiments, we developed CRE analysis by sequencing (CREseq), a high-throughput reporter gene assay for mammalian cells.CRE-seq leverages recent advances in oligonucleotide (oligo) synthesis (3) and high-throughput sequencing (4). Using arraybased oligo synthesis, we construct large numbers of reporter genes with unique sequence barcodes in their 3′ UTRs. These libraries of barcoded reporter genes are then transfected, en masse, into mammalian cells and quantified by performing RNA sequencing (RNA-Seq) (5) on the sequence barcodes. Here we present a study using CRE-seq to dissect a CRE in mouse Rhodopsin (Rho), a gene that is expressed strongly and specifically in the mammalian retina.Tight control of Rho expression is critical for the function of mammalian retinas (6, 7). Rho expression is regulated in mice by multiple CREs located at varying distances from the transcription start site (TSS) (8, 9). These elements are occupied in vivo by CRX, a re...
Transcription factor binding sites (TFBS) are being discovered at a rapid pace1, 2. We must now begin to turn our attention towards understanding how these sites work in combination to influence gene expression. Quantitative models that accurately predict gene expression from promoter sequence3-5 will be a crucial part of solving this problem. Here we present such a model based on the analysis of synthetic promoter libraries in yeast. Thermodynamic models based only on the equilibrium binding of transcription factors to DNA and to each other captured a large fraction of the variation in expression in every library. Thermodynamic analysis of these libraries uncovered several phenomena in our system, including cooperativity and the effects of weak binding sites. When applied to the genome, a model of repression by Mig1, which was trained on synthetic promoters, predicts a number of Mig1 regulated genes that lack significant Mig1 binding sites in their promoters. The success of the thermodynamic approach suggests that the information encoded by combinations of cis-regulatory sites is interpreted primarily through simple protein-DNA and protein-protein interactions with complicated biochemical reactions, such as nucleosome modifications, being down stream events. Quantitative analyses of synthetic promoter libraries will be an important tool in unraveling the rules underlying combinatorial cis-regulation.
Transcription factors (TFs) recognize short sequence motifs that are present in millions of copies in large eukaryotic genomes. TFsmust distinguish their target binding sites from a vast genomic excess of spurious motif occurrences; however, it is unclear whether functional sites are distinguished from nonfunctional motifs by local primary sequence features or by the larger genomic context in which motifs reside. We used a massively parallel enhancer assay in living mouse retinas to compare 1,300 sequences bound in the genome by the photoreceptor transcription factor Cone-rod homeobox (Crx), to 3,000 control sequences. We found that very short sequences bound in the genome by Crx activated transcription at high levels, whereas unbound genomic regions with equal numbers of Crx motifs did not activate above background levels, even when liberated from their larger genomic context. High local GC content strongly distinguishes bound motifs from unbound motifs across the entire genome. Our results show that the cis-regulatory potential of TF-bound DNA is determined largely by highly local sequence features and not by genomic context. (1), yet the sequence features that distinguish functional cis-regulatory sites from the millions of spurious motif occurrences in large eukaryotic genomes are poorly understood (2-6). Several models have been proposed to explain how TFs distinguish between functional cis-regulatory elements (CREs) and nonfunctional motif occurrences (3, 6, 7). In one model, large-scale chromatin context directs TF binding to target sites while limiting TF access to spurious motif occurrences (7-9). This model is supported by recent analyses of genomic DNaseI hypersensitivity, which show that only 1% of the genome typically resides in open chromatin in any given cell type (1, 10), suggesting that most spurious motif occurrences are inaccessible. A second model states that target sites are recognized through cooperative TF binding to highly specific combinations of sequence motifs, which are unlikely to occur by chance in nonregulatory regions of the genome (6, 11). This model is supported by evidence that the binding specificity of many TFs is affected by cooperative interactions with cofactors (12). A third model states that most TF binding is promiscuous, low occupancy, and nonfunctional, whereas functional CREs are characterized by high TF occupancy, achieved through either a permissive chromatin context or high affinity for TFs (3,6). This model is motivated by recent genomewide binding studies demonstrating that binding locations of functionally diverse TFs overlap substantially (13, 14), a result that suggests binding is unlikely to be primarily determined by rare, specific combinations of cooperative interactions. Regardless of the mechanisms by which TFs select functional CREs, the distinction between functional and nonfunctional motif occurrences must ultimately depend on information encoded either locally or within the larger sequence context surrounding functional CREs.To distinguish between t...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.