Boosting Virtual Screening Enrichments with Data Fusion: Coalescing Hits from Two-Dimensional Fingerprints, Shape, and Docking

Sastry, G. Madhavi; Inakollu, V. S. Sandeep; Sherman, Woody

doi:10.1021/ci300463g

Cited by 76 publications

(94 citation statements)

References 75 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Since the rank alone lacks information about the data spread, we instead used Z scores for the normalization of raw scores. 52 In addition, our Z scores are based on the median value instead of the average value for each scoring function, which reduces the sensitivity of Z scores to outliers.…”

Section: ■ Results and Discussionmentioning

confidence: 99%

Assessment of Solvated Interaction Energy Function for Ranking Antibody–Antigen Binding Affinities

Sulea

Vivcharuk

Corbeil

et al. 2016

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n'arrivez pas à les repérer, communiquez avec nous à PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. Questions? Contact the NRC Publications Archive team atPublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. If you wish to email the authors directly, please see the first page of the publication for their contact information. NRC Publications Archive Archives des publications du CNRCThis publication could be one of several versions: author's original, accepted manuscript or the publisher's version. / La version de cette publication peut être l'une des suivantes : la version prépublication de l'auteur, la version acceptée du manuscrit ou la version de l'éditeur. ABSTRACT: Affinity modulation of antibodies and antibody fragments of therapeutic value is often required in order to improve their clinical efficacies. Virtual affinity maturation has the potential to quickly focus on the critical hotspot residues without the combinatorial explosion problem of conventional display and library approaches. However, this requires a binding affinity scoring function that is capable of ranking single-point mutations of a starting antibody. We focus here on assessing the solvated interaction energy (SIE) function that was originally developed for and is widely applied to scoring of protein−ligand binding affinities. To this end, we assembled a structure−function data set calledS i n g l e -P o i n tM u t a n t Antibody Binding (SiPMAB) comprising several antibody− antigen systems suitable for this assessment, i.e., based on high-resolution crystal structures for the parent antibodies and coupled with high-quality binding affinity measurements for sets of single-point antibody mutants in each system. Using this data set, we tested the SIE function with several mutation protocols based on the popular methods SCWRL, Rosetta, and FoldX. We found that the SIE function coupled with a protocol limited to sampling only the mutated side chain can reasonably predict relative binding affinities with a Spearman rank-order correlation coefficient of about 0.6, outperforming more aggressive sampling protocols. Importantly, this performance is maintained for each of the seven system-specific component subsets as well as for other relevant subsets including non-alanine and charge-altering mutations. The transferability and enrichment in affinityimproving mutants can be further enhanced using consensus ranking over multiple methods, including the SIE, Talaris, and FOLDEF energy functions. The knowledge gained from this study can lead to successful prospective applications of virtual affinity maturation.

show abstract

Section: ■ Results and Discussionmentioning

confidence: 99%

Assessment of Solvated Interaction Energy Function for Ranking Antibody–Antigen Binding Affinities

Sulea

Vivcharuk

Corbeil

et al. 2016

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

“…Furthermore, our results were compared with recent similar studies such as rank- based group fusion by Chen et al [42] and standard score (Z-score) by Sastry et al [39]. In Chen et al study, the mean recall of their RKP method for MDDR1 data set range from 94.20 to 94.30, while in our method the minimum value of the upper band is 95.27 for Top10 and the maximum value is 99.95 for the Top100 method.…”

Section: Resultsmentioning

confidence: 78%

Condorcet and borda count fusion method for ligand-based virtual screening

et al. 2014

View full text Add to dashboard Cite

BackgroundIt is known that any individual similarity measure will not always give the best recall of active molecule structure for all types of activity classes. Recently, the effectiveness of ligand-based virtual screening approaches can be enhanced by using data fusion. Data fusion can be implemented using two different approaches: group fusion and similarity fusion. Similarity fusion involves searching using multiple similarity measures. The similarity scores, or ranking, for each similarity measure are combined to obtain the final ranking of the compounds in the database.ResultsThe Condorcet fusion method was examined. This approach combines the outputs of similarity searches from eleven association and distance similarity coefficients, and then the winner measure for each class of molecules, based on Condorcet fusion, was chosen to be the best method of searching. The recall of retrieved active molecules at top 5% and significant test are used to evaluate our proposed method. The MDL drug data report (MDDR), maximum unbiased validation (MUV) and Directory of Useful Decoys (DUD) data sets were used for experiments and were represented by 2D fingerprints.ConclusionsSimulated virtual screening experiments with the standard two data sets show that the use of Condorcet fusion provides a very simple way of improving the ligand-based virtual screening, especially when the active molecules being sought have a lowest degree of structural heterogeneity. However, the effectiveness of the Condorcet fusion was increased slightly when structural sets of high diversity activities were being sought.

show abstract

“…• Once all predictors have calculated their predictions, the predictions are combined ("fused", as in [7]) into one matrix.…”

Section: The Chemogenomics Pipelinementioning

confidence: 99%

“…Inside Janssen Pharmaceutica, a method is used which takes advantage of the fact that compounds with a similar structure often interact similarly with the same proteins. The Chemogenomics project inside Janssen Pharmaceutica attempts to identify candidate compounds by deriving information from existing compound-protein databases by means of machine learning methods [3,6,7]. To this end, a number of "predictor" programs have been developed.…”

Section: Introductionmentioning

confidence: 99%

Scaling machine learning for target prediction in drug discovery using Apache Spark

Harnie

Saey

Vapirev

et al. 2017

Future Generation Computer Systems

View full text Add to dashboard Cite

We have used Spark to automatically distribute C++ predictors over a cluster. Our Spark application allows near-linear speedup and optimal cluster utilization. The core of the algorithm is easily changed to allow for experimentation. AbstractIn the context of drug discovery, a key problem is the identification of candidate molecules that affect proteins associated with diseases. Inside Janssen Pharmaceutica, the Chemogenomics project aims to derive new candidates from existing experiments through a set of machine learning predictor programs, written in single-node C++. These programs take a long time to run and are inherently parallel, but do not use multiple nodes. We show how we reimplemented the pipeline using Apache Spark, which enabled us to lift the existing programs to a multi-node cluster without making changes to the predictors. We have benchmarked our Spark pipeline against the original, which shows almost linear speedup up to 8 nodes. In addition, our pipeline generates fewer intermediate files while allowing easier checkpointing and monitoring.

show abstract

Boosting Virtual Screening Enrichments with Data Fusion: Coalescing Hits from Two-Dimensional Fingerprints, Shape, and Docking

Cited by 76 publications

References 75 publications

Assessment of Solvated Interaction Energy Function for Ranking Antibody–Antigen Binding Affinities

Assessment of Solvated Interaction Energy Function for Ranking Antibody–Antigen Binding Affinities

Condorcet and borda count fusion method for ligand-based virtual screening

Scaling machine learning for target prediction in drug discovery using Apache Spark

Contact Info

Product

Resources

About