Abstract:We recently showed that the time-structure based independent component analysis method from Markov state model literature provided a set of variationally optimal slow collective variables for Metadynamics (tICA-Metadynamics). In this paper, we extend the methodology towards efficient sampling of related mutants by borrowing ideas from transfer learning methods in machine learning. Our method explicitly assumes that a similar set of slow modes and metastable states are found in both the wild type (base line) and its mutants. Under this assumption, we describe a few simple techniques using sequence mapping for transferring the slow modes and structural information contained in the wild type simulation to a mutant model for performing enhanced sampling. The resulting simulations can then be reweighted onto the full-phase space using Multi-state Bennett Acceptance Ratio, allowing for thermodynamic comparison against the wild type. We first benchmark our methodology by re-capturing alanine dipeptide dynamics across a range of different atomistic force fields, including the polarizable Amoeba force field, after learning a set of slow modes using Amber ff99sb-ILDN. We next extend the method by including structural data from the wild type simulation and apply the technique to recapturing the affects of the GTT mutation on the FIP35 WW domain.Introduction: Efficient sampling of protein configuration space remains an unsolved problem in computational biophysics. While algorithmic advances in molecular dynamics (MD) code bases 1 combined with distributed computing hardware 2 , specialized chips 3 , and large-scale increasingly faster GPU clusters have provided routine access to microsecond timescale dynamics, there is still room for significant improvements. One such potential avenue is predicting the effects of mutations onto the protein's wild type free energy landscape. At this point it is worth explicitly noting that while we use the biological terms wild type and mutant extensively, our methodology is easily generalizable to scenarios where a baseline (aka the wild type) free energy landscape has been mapped out and we now wish to understand the dynamical consequences of a making some changes to system (aka the mutation). Under the current scheme, one would have to re-run our entire simulation to ascertain the affects of a mutation onto a protein's free energy landscape. Due to the vast