2022
DOI: 10.1038/s41467-022-31643-3
|View full text |Cite
|
Sign up to set email alerts
|

Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes

Abstract: Characterizing the effect of mutations is key to understand the evolution of protein sequences and to separate neutral amino-acid changes from deleterious ones. Epistatic interactions between residues can lead to a context dependence of mutation effects. Context dependence constrains the amino-acid changes that can contribute to polymorphism in the short term, and the ones that can accumulate between species in the long term. We use computational approaches to accurately predict the polymorphisms segregating i… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 49 publications
3
11
0
Order By: Relevance
“…Finally, we also observe that some of the more buried residues have higher entropy as computed from the DMS of NDM-1 and VIM-2 than in the DCA model. This observation is consistent with the fact that DCA, which predicts a Gaussian-like distribution of mutational effects 37 , typically tends to underestimate the number of neutral (or almost neutral) mutations that are often one peak in a bimodal distribution 4,6,8 . It can also possibly be due to specific differences in residues and spatial arrangements between the two homologs, as compared to the global distribution of the DCA model.…”
Section: Resultssupporting
confidence: 83%
See 1 more Smart Citation
“…Finally, we also observe that some of the more buried residues have higher entropy as computed from the DMS of NDM-1 and VIM-2 than in the DCA model. This observation is consistent with the fact that DCA, which predicts a Gaussian-like distribution of mutational effects 37 , typically tends to underestimate the number of neutral (or almost neutral) mutations that are often one peak in a bimodal distribution 4,6,8 . It can also possibly be due to specific differences in residues and spatial arrangements between the two homologs, as compared to the global distribution of the DCA model.…”
Section: Resultssupporting
confidence: 83%
“…To probe the mutational behavior across the B1 family, we calculated Δ E for all single mutants of 100 diverse homologs, chosen to minimize pairwise sequence identity. To gauge the effects of epistatic networks on each protein position, we measure the mutational tolerance at each position in each homolog as the Shannon Entropy of all mutant probabilities relative to the wild type, also referred to as the context-dependent entropy (CDE) 37,38 . The mutational tolerance (CDE) is the base-2 logarithm of the effective number of tolerated mutations at a position in this specific WT background, such that 0 means only 1 (2 0 ) residue is tolerated (the WT, no mutations), and 4.3 means all 20 (2 4 3. )…”
Section: Resultsmentioning
confidence: 99%
“…Additionally, the fact that we observe such a high proportion of beneficial non-adaptive mutations suggests that the underlying assumptions of our model, namely site-independence, implying no epistasis, and a static fitness landscape, are a reasonable approximation for the underlying fitness landscape of proteins. Our results imply that the fitness effects of new mutations are mostly conserved across mammalian orthologs, in agreement with other studies showing that for conserved orthologs with similar structures and functions, models without epistasis provide a reasonable estimate of fitness effects in protein coding genes [62, 63].…”
Section: Discussionsupporting
confidence: 92%
“…This means that observed mutations are close to neutral, as opposed to a strongly deleterious bias of random mutations (remember that our evolutionary model combines mutation and selection in its elementary MCMC step). This is consistent with the DCA energy remaining on average constant along trajectories, but also with the empirical observation that natural aminoacid polymorphisms tend to be neutral [17] when scored by a DCA model. To better quantify this effect and to reduce statistical noise, we consider…”
Section: Context Dependence Shapes Evolutionary Paths: Contingency An...supporting
confidence: 88%
“…Our model suggests that the mutational dynamics of sites is linked to their degree of epistasis, which we now quantify by taking inspiration from previous works on the E.Coli genome [17] and the SARS-CoV-2 spike protein [36]. We classify each site according to two distinct notions of entropy as a simple information-theoretic measure of mutability.…”
Section: Epistasis Drives the Emergence Of Long Evolutionary Timescalesmentioning
confidence: 99%