Fitness landscapes1,2, depictions of how genotypes manifest at the phenotypic level, form the basis for our understanding of many areas of biology2–7 yet their properties remain elusive. Studies addressing this issue often consider specific genes and their function as proxy for fitness2,4, experimentally assessing the impact on function of single mutations and their combinations in a specific sequence2,5,8–15 or in different sequences2,3,5,16–18. However, systematic high-throughput studies of the local fitness landscape of an entire protein have not yet been reported. Here, we chart an extensive region of the local fitness landscape of the green fluorescent protein from Aequorea victoria (avGFP) by measuring the native function, fluorescence, of tens of thousands of derivative genotypes of avGFP. We find that its fitness landscape is narrow, with half of genotypes with two mutations showing reduced fluorescence and half of genotypes with five mutations being completely non-fluorescent. The narrowness is enhanced by epistasis, which was detected in up to 30% of genotypes with multiple mutations arising mostly through the cumulative impact of slightly deleterious mutations causing a threshold-like decrease of protein stability and concomitant loss of fluorescence. A model of orthologous sequence divergence spanning hundreds of millions of years predicted the extent of epistasis in our data, indicating congruence between the fitness landscape properties at the local and global scales. The characterization of the local fitness landscape of avGFP has important implications for a number of fields including molecular evolution, population genetics and protein design.
Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible.
MotivationComputational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results especially when exploring the effects of combination of different mutations.ResultsHere we use a protocol to measure the bias as a function of the number of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without experimentally measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using additional relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here.Availability and implementationAll calculations were implemented by in-house PERL scripts.Supplementary information Supplementary data are available at Bioinformatics online.NoteThe article 10.1093/bioinformatics/bty348, published alongside this paper, also addresses the problem of biases in protein stability change predictions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.