Allosteric communication between distant sites in proteins is central to biological regulation but still poorly characterised, limiting understanding, engineering and drug development [1][2][3][4][5][6] . An important reason for this is the lack of methods to comprehensively quantify allostery in diverse proteins. Here we address this shortcoming and present a method that uses deep mutational scanning to globally map allostery. The approach uses an efficient experimental design to infer en masse the causal biophysical effects of mutations by quantifying multiple molecular phenotypes-here binding and protein abundance-in multiple genetic backgrounds and fitting thermodynamic models using neural networks. We apply the approach to two of the most common human protein interaction domains, an SH3 domain and a PDZ domain, to produce comprehensive atlases of allosteric communication. Allosteric mutations are abundant with a large mutational target space of network-altering 'edgetic' variants. Mutations are more likely to be allosteric closer to binding interfaces, at Glycines and in specific residues connecting to an opposite surface in the PDZ domain. This general approach of quantifying mutational effects for multiple molecular phenotypes and in multiple genetic backgrounds should allow the energetic and allosteric landscapes of many proteins to be rapidly and comprehensively mapped.
The same mutation can have different effects in different individuals. One important reason for this is that the outcome of a mutation can depend on the genetic context in which it occurs. This dependency is known as epistasis. In recent years, there has been a concerted effort to quantify the extent of pairwise and higher-order genetic interactions between mutations through deep mutagenesis of proteins and RNAs. This research has revealed two major components of epistasis: nonspecific genetic interactions caused by nonlinearities in genotype-to-phenotype maps, and specific interactions between particular mutations. Here, we provide an overview of our current understanding of the mechanisms causing epistasis at the molecular level, the consequences of genetic interactions for evolution and genetic prediction, and the applications of epistasis for understanding biology and determining macromolecular structures.
SummaryA central question in genetics and evolution is the extent to which mutations have outcomes that change depending on the genetic context in which they occur1–3. Pairwise interactions between mutations have been systematically mapped within4–18 and between19 genes, and contribute substantially to phenotypic variation amongst individuals20. However, the extent to which genetic interactions themselves are stable or dynamic across genotypes is unclear21,22. Here we quantify >45,000 genetic interactions between the same 87 pairs of mutations across >500 closely related genotypes of a yeast tRNA. Strikingly, all pairs of mutations interacted in at least 9% of genetic backgrounds and all pairs switched from interacting positively to interacting negatively in different genotypes (FDR<0.1). Higher order interactions are also abundant and dynamic across genotypes. The epistasis in this molecule means that all individual mutations switch from detrimental to beneficial in even closely-related genotypes. As a consequence, accurate genetic prediction requires mutation effects to be measured across different genetic backgrounds and the use of higher order epistatic terms.
Most variants associated with complex traits and diseases identified by genome-wide association studies (GWAS) map to noncoding regions of the genome with unknown effects. Using ancestrally diverse biobank-scale GWAS data, massively parallel CRISPR screens, and single cell transcriptomic and proteomic sequencing, we discovered 124 cis -target genes of 91 noncoding blood trait GWAS loci. Using precise variant insertion via base editing, we connected specific variants with gene expression changes. We also identified trans -effect networks of noncoding loci when cis target genes encoded transcription factors or microRNAs. Networks were themselves enriched for GWAS variants and demonstrated polygenic contributions to complex traits. This platform enables massively-parallel characterization of the target genes and mechanisms of human noncoding variants in both cis and trans .
Information that regulates gene expression is encoded throughout each gene but if different regulatory regions can be understood in isolation, or if they interact, is unknown. Here we measure mRNA levels for 10,000 open reading frames (ORFs) transcribed from either an inducible or constitutive promoter. We find that the strength of cotranslational regulation on mRNA levels is determined by promoter architecture. By using a novel computational genetic screen of 6402 RNA-seq experiments, we identify the RNA helicase Dbp2 as the mechanism by which cotranslational regulation is reduced specifically for inducible promoters. Finally, we find that for constitutive genes, but not inducible genes, most of the information encoding regulation of mRNA levels in response to changes in growth rate is encoded in the ORF and not in the promoter. Thus, the ORF sequence is a major regulator of gene expression, and a nonlinear interaction between promoters and ORFs determines mRNA levels.
The majority of variants associated with complex traits and common diseases identified by genome-wide association studies (GWAS) map to noncoding regions of the genome with unknown regulatory effects in cis and trans. By leveraging biobank-scale GWAS data, massively parallel CRISPR screens and single cell transcriptome sequencing, we discovered target genes of noncoding variants for blood trait loci. The closest gene was often the target gene, but this was not always the case. We also identified trans-effects networks of noncoding variants when cis target genes encoded transcription factors, such as GFI1B and NFE2. We observed that GFI1B trans-target genes were enriched for GFI1B binding sites and fine-mapped GWAS variants, and expressed in human bone marrow progenitor cells, suggesting that GFI1B acts as a master regulator of blood traits. This platform will enable massively parallel assays to catalog the target genes of human noncoding variants in both cis and trans.
Autoregulatory feedback loops occur in the regulation of molecules ranging from ATP to MAP kinases to zinc. Negative feedback loops can increase a system's robustness, while positive feedback loops can mediate transitions between cell states. Recent genome-wide experimental and computational studies predict hundreds of novel feedback loops. However, not all physical interactions are regulatory, and many experimental methods cannot detect self-interactions. Our understanding of regulatory feedback loops is therefore hampered by the lack of high-throughput methods to experimentally quantify the presence, strength and temporal dynamics of autoregulatory feedback loops. Here we present a mathematical and experimental framework for high-throughput quantification of feedback regulation and apply it to RNA binding proteins (RBPs) in yeast. Our method is able to determine the existence of both direct and indirect positive and negative feedback loops, and to quantify the strength of these loops. We experimentally validate our model using two RBPs which lack native feedback loops and by the introduction of synthetic feedback loops. We find that RBP Puf3 does not natively participate in any direct or indirect feedback regulation, but that replacing the native 3'UTR with that of COX17 generates an auto-regulatory negative feedback loop which reduces gene expression noise. Likewise, RBP Pub1 does not natively participate in any feedback loops, but a synthetic positive feedback loop involving Pub1 results in increased expression noise. Our results demonstrate a synthetic experimental system for quantifying the existence and strength of feedback loops using a combination of high-throughput experiments and mathematical modeling. This system will be of great use in measuring auto-regulatory feedback by RNA binding proteins, a regulatory motif that is difficult to quantify using existing high-throughput methods.
Allosteric communication between distant sites in proteins is central to nearly all biological regulation but still poorly characterised for most proteins, limiting conceptual understanding, biological engineering and allosteric drug development. Typically only a few allosteric sites are known in model proteins, but theoretical, evolutionary and some experimental studies suggest they may be much more widely distributed. An important reason why allostery remains poorly characterised is the lack of methods to systematically quantify long-range communication in diverse proteins. Here we address this shortcoming by developing a method that uses deep mutational scanning to comprehensively map the allosteric landscapes of protein interaction domains. The key concept of the approach is the use of 'multidimensional mutagenesis': mutational effects are quantified for multiple molecular phenotypes - here binding and protein abundance -and in multiple genetic backgrounds. This is an efficient experimental design that allows the underlying causal biophysical effects of mutations to be accurately inferred en masse by fitting thermodynamic models using neural networks. We apply the approach to two of the most common human protein interaction domains, an SH3 domain and a PDZ domain, to produce the first global atlases of allosteric mutations for any proteins. Allosteric mutations are widely dispersed with extensive long-range tuning of binding affinity and a large mutational target space of network-altering 'edgetic' variants. Mutations are more likely to be allosteric closer to binding interfaces, at Glycines in secondary structure elements and at particular sites including a chain of residues connecting to an opposite surface in the PDZ domain. This general approach of quantifying mutational effects for multiple molecular phenotypes and in multiple genetic backgrounds should allow the energetic and allosteric landscapes of many proteins to be rapidly and comprehensively mapped.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.