Site saturation mutagenesis of 500 human protein domains reveals the contribution of protein destabilization to genetic disease
Antoni Beltran,
Xiang’er Jiang,
Yue Shen
et al.
Abstract:Missense variants that change the amino acid sequences of proteins cause one third of human genetic diseases1. Tens of millions of missense variants exist in the current human population, with the vast majority having unknown functional consequences. Here we present the first large-scale experimental analysis of human missense variants across many different proteins. Using DNA synthesis and cellular selection experiments we quantify the impact of >500,000 variants on the abundance of >500 human protein d… Show more
“…For each protein we measured the stability of a median of 8,500 randomly sampled genotypes using a highly-validated selection assay that quantifies the cellular concentration of folded protein over at least three orders of magnitude 18,20,45,64,65 . The cellular protein abundance measurements were well correlated for each protein between replicate experiments (median Pearson correlation coefficient r=0.78, Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Indeed the exclusion of water by the burial of hydrophobic side chains -the hydrophobic effectis considered the major driving force in protein folding [5][6][7][8][9][10] , and buried core residues are both highly conserved during evolution and very sensitive to mutation [11][12][13][14] . In contrast, solvent-exposed residues on the surfaces of proteins are faster evolving with mutations typically having much smaller effects on stability 1,[15][16][17][18][19][20] . Surface residues can, however, be important for function, for example forming binding interfaces.…”
Section: Introductionmentioning
confidence: 99%
“…At the other extreme, mutations in cores might have largely independent effects, for example if core side-chain packing is highly malleable, allowing re-packing in many different combinations [30][31][32][33][34][35][36][37][38][39] . Unfortunately, quantifying the effects of individual mutations [15][16][17]19,20,[40][41][42] or pairs of mutations 18,[43][44][45] provides little information about the genetic and energetic architecture of cores as it only explores very local sequence space, revealing the outcome when one or two side chains are changed. Rather, what is needed are experiments where the side chains of many buried core positions are simultaneously changed in many different combinations, an approach referred to as combinatorial mutagenesis or core randomisation 3,[46][47][48][49][50][51][52][53] .…”
Protein folding is driven by the burial of hydrophobic amino acids in a tightly-packed core that excludes water. The genetics, biophysics and evolution of hydrophobic cores are not well understood, in part because of a lack of systematic experimental data on sequence combinations that do - and do not - constitute stable and functional cores. Here we randomize protein hydrophobic cores and evaluate their stability and function at scale. The data show that vast numbers of amino acid combinations can constitute stable protein cores but that these alternative cores frequently disrupt protein function because of allosteric effects. These strong allosteric effects are not due to complicated, highly epistatic fitness landscapes but rather, to the pervasive nature of allostery, with many individually small energy changes combining to disrupt function. Indeed both protein stability and ligand binding can be accurately predicted over very large evolutionary distances using additive energy models with a small contribution from pairwise energetic couplings. As a result, energy models trained on one protein can accurately predict core stability across hundreds of millions of years of protein evolution, with only rare energetic couplings that we experimentally identify limiting the transplantation of cores between highly diverged proteins. Our results reveal the simple energetic architecture of protein hydrophobic cores and suggest that allostery is a major constraint on sequence evolution.
“…For each protein we measured the stability of a median of 8,500 randomly sampled genotypes using a highly-validated selection assay that quantifies the cellular concentration of folded protein over at least three orders of magnitude 18,20,45,64,65 . The cellular protein abundance measurements were well correlated for each protein between replicate experiments (median Pearson correlation coefficient r=0.78, Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Indeed the exclusion of water by the burial of hydrophobic side chains -the hydrophobic effectis considered the major driving force in protein folding [5][6][7][8][9][10] , and buried core residues are both highly conserved during evolution and very sensitive to mutation [11][12][13][14] . In contrast, solvent-exposed residues on the surfaces of proteins are faster evolving with mutations typically having much smaller effects on stability 1,[15][16][17][18][19][20] . Surface residues can, however, be important for function, for example forming binding interfaces.…”
Section: Introductionmentioning
confidence: 99%
“…At the other extreme, mutations in cores might have largely independent effects, for example if core side-chain packing is highly malleable, allowing re-packing in many different combinations [30][31][32][33][34][35][36][37][38][39] . Unfortunately, quantifying the effects of individual mutations [15][16][17]19,20,[40][41][42] or pairs of mutations 18,[43][44][45] provides little information about the genetic and energetic architecture of cores as it only explores very local sequence space, revealing the outcome when one or two side chains are changed. Rather, what is needed are experiments where the side chains of many buried core positions are simultaneously changed in many different combinations, an approach referred to as combinatorial mutagenesis or core randomisation 3,[46][47][48][49][50][51][52][53] .…”
Protein folding is driven by the burial of hydrophobic amino acids in a tightly-packed core that excludes water. The genetics, biophysics and evolution of hydrophobic cores are not well understood, in part because of a lack of systematic experimental data on sequence combinations that do - and do not - constitute stable and functional cores. Here we randomize protein hydrophobic cores and evaluate their stability and function at scale. The data show that vast numbers of amino acid combinations can constitute stable protein cores but that these alternative cores frequently disrupt protein function because of allosteric effects. These strong allosteric effects are not due to complicated, highly epistatic fitness landscapes but rather, to the pervasive nature of allostery, with many individually small energy changes combining to disrupt function. Indeed both protein stability and ligand binding can be accurately predicted over very large evolutionary distances using additive energy models with a small contribution from pairwise energetic couplings. As a result, energy models trained on one protein can accurately predict core stability across hundreds of millions of years of protein evolution, with only rare energetic couplings that we experimentally identify limiting the transplantation of cores between highly diverged proteins. Our results reveal the simple energetic architecture of protein hydrophobic cores and suggest that allostery is a major constraint on sequence evolution.
“…A second key aspect of our approach is the use of a kinetic selection assay where enrichments report on the rate of a reaction. This is not true for most mutation-selection-sequencing experiments, where enrichments depend on thermodynamic stabilities 30,50,51 .…”
Amyloid protein aggregates are pathological hallmarks of more than fifty human diseases including the most common neurodegenerative disorders. The atomic structures of amyloid fibrils have now been determined, but the process by which soluble proteins nucleate to form amyloids remains poorly characterised and difficult to study, even though this is the key step to understand to prevent the formation and spread of aggregates. Here we use massively parallel combinatorial mutagenesis, a kinetic selection assay, and machine learning to reveal the transition state of the nucleation reaction of amyloid beta, the protein that aggregates in Alzheimer's disease. By quantifying the nucleation of >140,000 proteins we infer the changes in activation energy for all 798 amino acid substitutions in amyloid beta and the energetic couplings between >600 pairs of mutations. This unprecedented dataset provides the first comprehensive view of the energy landscape and the first large-scale measurement of energetic couplings for a protein transition state. The energy landscape reveals that the amyloid beta nucleation transition state contains a short structured C-terminal hydrophobic core with a subset of interactions similar to mature fibrils. This study demonstrates the feasibility of using mutation-selection-sequencing experiments to study transition states and identifies the key molecular species that initiates amyloid beta aggregation and, potentially, Alzheimer's disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.