Genotype-phenotype relationships are notoriously complicated. Idiosyncratic interactions between specific combinations of mutations occur and are difficult to predict. Yet it is increasingly clear that many interactions can be understood in terms of global epistasis. That is, mutations may act additively on some underlying, unobserved trait, and this trait is then transformed via a nonlinear function to the observed phenotype as a result of subsequent biophysical and cellular processes. Here we infer the shape of such global epistasis in three proteins, based on published high-throughput mutagenesis data. To do so, we develop a maximum-likelihood inference procedure using a flexible family of monotonic nonlinear functions spanned by an I-spline basis. Our analysis uncovers dramatic nonlinearities in all three proteins; in some proteins a model with global epistasis accounts for virtually all of the measured variation, whereas in others we find substantial local epistasis as well. This method allows us to test hypotheses about the form of global epistasis and to distinguish variance components attributable to global epistasis, local epistasis, and measurement error.
We present a refractive-index-matched colloidal system that allows direct observation of critical Casimir induced aggregation with a confocal microscope. We show that in this system, in which van der Waals forces are negligible, a simple competition between repulsive screened Coulomb and attractive critical Casimir forces can account quantitatively for the reversible aggregation. Above the temperature T(a), the critical Casimir force drives aggregation of the particles into fractal clusters, while below T(a), the electrostatic repulsion between the particles breaks up the clusters, and the particles resuspend by thermal diffusion. The aggregation is observed in a remarkably wide temperature range of as much as 15 degrees. We derive a simple expression for the particle pair potential that accounts quantitatively for the temperature-dependent aggregation and aggregate breakup.
Understanding the relationship between protein sequence, function, and stability is a fundamental problem in biology. The essential function of many proteins that fold into a specific structure is their ability to bind to a ligand, which can be assayed for thousands of mutated variants. However, binding assays do not distinguish whether mutations affect the stability of the binding interface or the overall fold. Here, we introduce a statistical method to infer a detailed energy landscape of how a protein folds and binds to a ligand by combining information from many mutated variants. We fit a thermodynamic model describing the bound, unbound, and unfolded states to high quality data of protein G domain B1 binding to IgG-Fc. We infer distinct folding and binding energies for each mutation providing a detailed view of how mutations affect binding and stability across the protein. We accurately infer the folding energy of each variant in physical units, validated by independent data, whereas previous high-throughput methods could only measure indirect changes in stability. While we assume an additive sequence-energy relationship, the binding fraction is epistatic due its nonlinear relation to energy. Despite having no epistasis in energy, our model explains much of the observed epistasis in binding fraction, with the remaining epistasis identifying conformationally dynamic regions.
The genotype-fitness map plays a fundamental role in shaping the dynamics of evolution. However, it is difficult to directly measure a fitness landscape in practice, because the number of possible genotypes is astronomical. One approach is to sample as many genotypes as possible, measure their fitnesses, and fit a statistical model of the landscape that includes additive and pairwise interactive effects between loci. Here, we elucidate the pitfalls of using such regressions by studying artificial but mathematically convenient fitness landscapes. We identify two sources of bias inherent in these regression procedures, each of which tends to underestimate high fitnesses and overestimate low fitnesses. We characterize these biases for random sampling of genotypes as well as samples drawn from a population under selection in the WrightFisher model of evolutionary dynamics. We show that common measures of epistasis, such as the number of monotonically increasing paths between ancestral and derived genotypes, the prevalence of sign epistasis, and the number of local fitness maxima, are distorted in the inferred landscape. As a result, the inferred landscape will provide systematically biased predictions for the dynamics of adaptation. We identify the same biases in a computational RNAfolding landscape as well as regulatory sequence binding data treated with the same fitting procedure. Finally, we present a method to ameliorate these biases in some cases. molecular evolution | experimental evolution | penalized regression
Genotype-to-phenotype maps and the related fitness landscapes that include epistatic interactions are difficult to measure because of their high dimensional structure. Here we construct such a map using the recently collected corpora of high-throughput sequence data from the 75 base pairs long mutagenized E. coli lac promoter region, where each sequence is associated with its phenotype, the induced transcriptional activity measured by a fluorescent reporter. We find that the additive (non-epistatic) contributions of individual mutations account for about two-thirds of the explainable phenotype variance, while pairwise epistasis explains about 7% of the variance for the full mutagenized sequence and about 15% for the subsequence associated with protein binding sites. Surprisingly, there is no evidence for third order epistatic contributions, and our inferred fitness landscape is essentially single peaked, with a small amount of antagonistic epistasis. There is a significant selective pressure on the wild type, which we deduce to be multi-objective optimal for gene expression in environments with different nutrient sources. We identify transcription factor (CRP) and RNA polymerase binding sites in the promotor region and their interactions without difficult optimization steps. In particular, we observe evidence for previously unexplored genetic regulatory mechanisms, possibly kinetic in nature. We conclude with a cautionary note that inferred properties of fitness landscapes may be severely influenced by biases in the sequence data.
The vertebrate adaptive immune system provides a flexible and diverse set of molecules to neutralize pathogens. Yet, viruses such as HIV can cause chronic infections by evolving as quickly as the adaptive immune system, forming an evolutionary arms race. Here we introduce a mathematical framework to study the coevolutionary dynamics between antibodies and antigens within a host. We focus on changes in the binding interactions between the antibody and antigen populations, which result from the underlying stochastic evolution of genotype frequencies driven by mutation, selection, and drift. We identify the critical viral and immune parameters that determine the distribution of antibody-antigen binding affinities. We also identify definitive signatures of coevolution that measure the reciprocal response between antibodies and viruses, and we introduce experimentally measurable quantities that quantify the extent of adaptation during continual coevolution of the two opposing populations. Using this analytical framework, we infer rates of viral and immune adaptation based on time-shifted neutralization assays in two HIV-infected patients. Finally, we analyze competition between clonal lineages of antibodies and characterize the fate of a given lineage in terms of the state of the antibody and viral populations. In particular, we derive the conditions that favor the emergence of broadly neutralizing antibodies, which may have relevance to vaccine design against HIV.
Antigenic drift of influenza virus hemagglutinin (HA) is enabled by facile evolvability. However, HA antigenic site B, which has become immunodominant in recent human H3N2 influenza viruses, is also evolutionarily constrained by its involvement in receptor binding. Here, we employ deep mutational scanning to probe the local fitness landscape of HA antigenic site B in six different human H3N2 strains spanning from 1968 to 2016. We observe that the fitness landscape of HA antigenic site B can be very different between strains. Sequence variants that exhibit high fitness in one strain can be deleterious in another, indicating that the evolutionary constraints of antigenic site B have changed over time. Structural analysis suggests that the local fitness landscape of antigenic site B can be reshaped by natural mutations via modulation of the receptor-binding mode. Overall, these findings elucidate how influenza virus continues to explore new antigenic space despite strong functional constraints.
The vertebrate adaptive immune system provides a flexible and diverse set of molecules to neutralize pathogens. Yet, viruses such as HIV can cause chronic infections by evolving as quickly as the adaptive immune system, forming an evolutionary arms race. Here we introduce a mathematical framework to study the coevolutionary dynamics of antibodies with antigens within a host. We focus on changes in the binding interactions between the antibody and antigen populations, which result from the underlying stochastic evolution of genotype frequencies driven by mutation, selection, and drift. We identify the critical viral and immune parameters that determine the distribution of antibody-antigen binding affinities. We also identify definitive signatures of coevolution that measure the reciprocal response between antibodies and viruses, and we introduce experimentally measurable quantities that quantify the extent of adaptation during continual coevolution of the two opposing populations. Using this analytical framework, we infer rates of viral and immune adaptation based on time-shifted neutralization assays in two HIV-infected patients. Finally, we analyze competition between clonal lineages of antibodies and characterize the fate of a given lineage in terms of the state of the antibody and viral populations. In particular, we derive the conditions that favor the emergence of broadly neutralizing antibodies, which may be useful in designing a vaccine against HIV. IntroductionIt takes decades for humans to reproduce, but our pathogens can reproduce in less than a day. How can we coexist with pathogens whose potential to evolve is 10 4 -times faster than our own? In vertebrates, the answer lies in their adaptive immune system, which uses recombination, mutation, and selection to evolve a response on the same time-scale at which pathogens themselves evolve.One of the central actors in the adaptive immune system are B-cells, which recognize pathogens using highly diverse membrane-bound receptors. Naive B-cells are created by processes which generate extensive genetic diversity in their receptors via recombination, insertions and deletions, and hypermutations [1] which can potentially produce ∼ 10 18 variants in a human repertoire [2]. This estimate of potential lymphocyte diversity outnumbers the total population size of B-cells in humans, i.e., ∼ 10 10 [3,4]. During an infection, B-cells aggregate to form germinal centers, where they hypermutate at a rate of about ∼ 10 −3 per base pair per cell division over a region of 1-2 kilo base pairs [5]. The B-cell hypermutation rate is approximately 4 − 5 orders of magnitude larger than an average germline mutation rate per cell division in humans [6]. Mutated B-cells compete for survival and proliferation signals from helper T-cells, based on the B-cell receptor's binding to antigens. This form of natural selection is known as affinity maturation, and * Correspondence should be addressed to: Armita Nourmohammad (armitan@princeton.edu).† Authors with equal contribution it can incr...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.