9In allosteric proteins, the binding of a ligand modifies function at a distant active site. Such al-10 losteric pathways can be used as target for drug design, generating considerable interest in inferring 11 them from sequence alignment data. Currently, different methods lead to conflicting results, in par-12 ticular on the existence of long-range evolutionary couplings between distant amino-acids mediating 13 allostery. Here we propose a resolution of this conundrum, by studying epistasis and its inference in 14 models where an allosteric material is evolved in silico to perform a mechanical task. We find four 15 types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range 16 and have a simple mechanical interpretation. We perform a Direct Coupling Analysis (DCA) and 17 find that DCA predicts well mutation costs but is a rather poor generative model. Strikingly, it can 18 predict short-range epistasis but fails to capture long-range epistasis, in agreement with empirical 19 findings. We propose that such failure is generic when function requires subparts to work in concert. 20 We illustrate this idea with a simple model, which suggests that other methods may be better suited 21 to capture long-range effects. 22
Author summaryAllostery in proteins is the property of highly specific responses to ligand binding at a distant site. To inform protocols of de novo drug design, it is fundamental to understand the impact of mutations on allosteric regulation and whether it can be predicted from evolutionary correlations. In this work we consider allosteric architectures artificially evolved to optimize the cooperativity of binding at allosteric and active site. We first characterize the emergent pattern of epistasis as well as the underlying mechanical phenomena, finding four types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range.The numerical evolution of these allosteric architectures allows us to benchmark Direct Coupling Analysis, a method which relies on co-evolution in sequence data to infer direct evolutionary couplings, in connection to allostery. We show that Direct Coupling Analysis predicts quantitatively mutation costs but underestimates strong long-range epistasis. We provide an argument, based on a simplified model, illustrating the reasons for this discrepancy and we propose neural networks as more promising tool to measure epistasis.
23
Introduction
24Allosteric regulation in proteins allows for the control of functional activity by ligand binding at a distal 25 allosteric site [1] and its detection could guide drug design [2, 3]. Yet, understanding the principles re-26 sponsible for allostery remains a challenge. How random mutations dysregulate allosteric communication 27 is a valuable information studied experimentally [4] and computationally [5]. Several analyses have high-28 lighted the non-additivity of mutational effects or epistasis. This "interaction" between mutations can 29 span long-range ...