We present a probabilistic search algorithm for rigid-body protein-protein docking. The algorithm is a realization of the basin hopping framework for sampling low-energy local minima of a given energy function. To save computational resources, the algorithm employs a machine learning model to score bound configurations prior to subjecting promising configurations to a local optimization with a sophisticated force field. The machine learning model is a decision tree trained on 138 known native dimeric interactions to learn features that constitute a true interaction interface. The FoldX force field is employed only on dimeric configurations sampled by the algorithm that are determined by the decision tree model to contain true interaction interfaces. The preliminary results are promising and motivate us for further investigation of such an informatics-driven approach to protein-protein docking.
Categories and Subject Descriptors
General Terms
Algorithms
KeywordsProtein-protein rigid docking; machine learning; native interaction; basin hopping
BACKGROUNDProteins take specific three dimensional shapes that they use to bind with other molecules and so perform specific cellular tasks. Modeling these bound complexes is key to char- * corresponding author Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. acterizing supramolecular assemblies and so understanding the molecular basis of biological function. Template-free structural characterization of protein assemblies entails searching a high-dimensional configuration space. Simplifying the problem of protein-protein docking to its rigid-body dimeric version brings the dimensionality of the search space down to the 6 parameters needed to encode spatial arrangements of the moving unit around the reference unit.Though significant computational efforts are devoted to pairwise rigid-body protein-protein docking, the problem remains challenging. Primarily, the difficulty lies with either search algorithms of limited exploration capability or with the accuracy of the criterion used to guide these algorithms to the true native assembly, or a combination of both. Lately, a lot of work has resulted in probabilistic search algorithms with high exploration capability [13,14,7,8]. However, guidance of these algorithms by an energy function presents a problem, as all current energy functions, even physics-based ones, contain errors and distort the true underlying energy surface. To address this issue, a complementary direction of research focuses on learning aspects of native interaction interface and encoding them either explicitly in the search process itself [5,6,7] or implicitly in a pseudo-energy function [10,1,2]....