Despite continual progress in the cataloging of vertebrate regulatory elements, little is known about their organization and regulatory architecture. Here we describe a massively parallel experiment to systematically test the impact of copy number, spacing, combination and order of transcription factor binding sites on gene expression. A complex library of ~5,000 synthetic regulatory elements containing patterns from 1 2 liver-specific transcription factor binding sites was assayed in mice and in HepG2 cells. We find that certain transcription factors act as direct drivers of gene expression in homotypic clusters of binding sites, independent of spacing between sites, whereas others function only synergistically. Heterotypic enhancers are stronger than their homotypic analogs and favor specific transcription factor binding site combinations, mimicking putative native enhancers. Exhaustive testing of binding site permutations suggests that there is flexibility in binding site order. Our findings provide quantitative support for a flexible model of regulatory element activity and suggest a framework for the design of synthetic tissue-specific enhancers.
Candidate enhancers can be identified on the basis of chromatin modifications, the binding of chromatin modifiers and transcription factors and cofactors, or chromatin accessibility. However, validating such candidates as bona fide enhancers requires functional characterization, typically achieved through reporter assays that test whether a sequence can increase expression of a transcriptional reporter via a minimal promoter. A longstanding concern is that reporter assays are mainly implemented on episomes, which are thought to lack physiological chromatin. However, the magnitude and determinants of differences in cis-regulation for regulatory sequences residing in episomes versus chromosomes remain almost completely unknown. To address this systematically, we developed and applied a novel lentivirus-based massively parallel reporter assay (lentiMPRA) to directly compare the functional activities of 2236 candidate liver enhancers in an episomal versus a chromosomally integrated context. We find that the activities of chromosomally integrated sequences are substantially different from the activities of the identical sequences assayed on episomes, and furthermore are correlated with different subsets of ENCODE annotations. The results of chromosomally based reporter assays are also more reproducible and more strongly predictable by both ENCODE annotations and sequence-based models. With a linear model that combines chromatin annotations and sequence information, we achieve a Pearson's R 2 of 0.362 for predicting the results of chromosomally integrated reporter assays. This level of prediction is better than with either chromatin annotations or sequence information alone and also outperforms predictive models of episomal assays. Our results have broad implications for how cis-regulatory elements are identified, prioritized and functionally validated.
Enhancers control the timing, location and expression levels of their target genes. Nucleotide variation in enhancers has been shown to lead to numerous phenotypes, including human disease. While putative enhancer sequences and nucleotide variation within them can now be detected in a rapid manner using various genomic technologies, the understanding of the functional consequences of these variants still remains largely unknown. Massively parallel reporter assays (MPRAs) can overcome this hurdle by providing the ability to test thousands of sequences and nucleotide variants within them for enhancer activity en masse. Here, we describe this technology and specifically focus on how it is being used to obtain an increased understanding of enhancer regulatory code and grammar.
The majority of common variants associated with common diseases, as well as an unknown proportion of causal mutations for rare diseases, fall in noncoding regions of the genome. Although catalogs of noncoding regulatory elements are steadily improving, we have a limited understanding of the functional effects of mutations within them. Here, we perform saturation mutagenesis in conjunction with massively parallel reporter assays on 20 disease-associated gene promoters and enhancers, generating functional measurements for over 30,000 single nucleotide substitutions and deletions. We find that the density of putative transcription factor binding sites varies widely between regulatory elements, as does the extent to which evolutionary conservation or integrative scores predict functional effects. These data provide a powerful resource for interpreting the pathogenicity of clinically observed mutations in these disease-associated regulatory elements, and comprise a rich dataset for the further development of algorithms that aim to predict the regulatory effects of noncoding mutations.
Candidate enhancers can be identified on the basis of chromatin modifications, the binding of chromatin modifiers and transcription factors and cofactors, or chromatin accessibility. However, validating such candidates as bona fide enhancers requires functional characterization, typically achieved through reporter assays that test whether a sequence can increase expression of a transcriptional reporter via a minimal promoter. A longstanding concern is that reporter assays are mainly implemented on episomes, which are thought to lack physiological chromatin. However, the magnitude and determinants of differences in cis-regulation for regulatory sequences residing in episomes versus chromosomes remain almost completely unknown. To address this systematically, we developed and applied a novel lentivirus-based massively parallel reporter assay (lentiMPRA) to directly compare the functional activities of 2236 candidate liver enhancers in an episomal versus a chromosomally integrated context. We find that the activities of chromosomally integrated sequences are substantially different from the activities of the identical sequences assayed on episomes, and furthermore are correlated with different subsets of ENCODE annotations. The results of chromosomally based reporter assays are also more reproducible and more strongly predictable by both ENCODE annotations and sequence-based models. With a linear model that combines chromatin annotations and sequence information, we achieve a Pearson's R 2 of 0.362 for predicting the results of chromosomally integrated reporter assays. This level of prediction is better than with either chromatin annotations or sequence information alone and also outperforms predictive models of episomal assays. Our results have broad implications for how cis-regulatory elements are identified, prioritized and functionally validated.
Massively parallel reporter assays (MPRAs) functionally screen thousands of sequences for regulatory activity in parallel. To date, there has been no systematic comparison of differences in MPRA design. Here, we screen a library of 2,440 candidate liver enhancers and controls for regulatory activity in HepG2 cells using nine different MPRA designs. We identify subtle but significant differences that correlate with epigenetic and sequence-level features, as well as differences in dynamic range and reproducibility. We also validate en masse that enhancer activity is robustly independent of orientation, at least for our library and designs. Finally, with a new method, we assemble and test the same enhancers as 192-mers, 354-mers, and 678-mers, and observe surprisingly large differences. This work provides a framework for the experimental design of high-throughput reporter assays, suggesting that the extended sequence context of tested elements, and to a lesser degree the precise assay, influence MPRA results.
Massively parallel reporter assays (MPRAs) can simultaneously measure the function of thousands of candidate regulatory sequences (CRSs) in a quantitative manner. In this method, CRSs are cloned upstream of a minimal promoter and reporter gene, alongside a unique barcode, and introduced into cells. If the CRS is a functional regulatory element, it will lead to the transcription of the barcode sequence, which is measured via RNA sequencing and normalized for cellular integration via DNA sequencing of the barcode. This technology has been used to test thousands of sequences and their variants for regulatory activity, to decipher the regulatory code and its evolution, and to develop genetic switches. Lentivirus-based MPRA (lentiMPRA) produces 'in-genome' readouts and enables the use of this technique in hard-totransfect cells. Here, we provide a detailed protocol for lentiMPRA, along with a user-friendly Nextflow-based computational pipeline-MPRAflow-for quantifying CRS activity from different MPRA designs. The lentiMPRA protocol takes~2 months, which includes sequencing turnaround time and data processing with MPRAflow.
SUMMARY Epigenomic regulation and lineage-specific gene expression act in concert to drive cellular differentiation, but the temporal interplay between these processes is largely unknown. Using neural induction from human pluripotent stem cells (hPSCs) as a paradigm, we interrogated these dynamics by performing RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), and assay for transposase accessible chromatin using sequencing (ATAC-seq) at seven time points during early neural differentiation. We found that changes in DNA accessibility precede H3K27ac, which is followed by gene expression changes. Using massively parallel reporter assays (MPRAs) to test the activity of 2,464 candidate regulatory sequences at all seven time points, we show that many of these sequences have temporal activity patterns that correlate with their respective cell-endogenous gene expression and chromatin changes. A prioritization method incorporating all genomic and MPRA data further identified key transcription factors involved in driving neural fate. These results provide a comprehensive resource of genes and regulatory elements that orchestrate neural induction and illuminate temporal frameworks during differentiation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.