Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines

Alvarsson, Jonathan; Eklund, Martin; Andersson, Claes; Carlsson, Lars; Spjuth, Ola; Wikberg, Jarl E. S.

doi:10.1021/ci500344v

Cited by 35 publications

(35 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Scikit-learn implementation of SVM is based on the Libsvm and the Liblinear libraries [66, 67]. When optimizing the SVM using the non-linear RBF kernel the values for hyper-parameters gamma ( γ ) and Cost ( C ) where selected with similar range to those reported by Alvarsson et al [68]. Here values for gamma ( γ ) tested were 10e−1, 10e−2, 10e−3, 10e−4, 10e−5 and for cost ( C ) 1, 10, 100, 1000.…”

Section: Methodsmentioning

confidence: 99%

Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data

et al. 2017

View full text Add to dashboard Cite

BackgroundIn recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large number of hyper-parameter configurations were explored to investigate how they affect the performance of DNNs and could act as starting points when tuning DNNs and second their performance was compared to popular methods widely employed in the field of cheminformatics namely Naïve Bayes, k-nearest neighbor, random forest and support vector machines. Moreover, robustness of machine learning methods to different levels of artificially introduced noise was assessed. The open-source Caffe deep-learning framework and modern NVidia GPU units were utilized to carry out this study, allowing large number of DNN configurations to be explored.ResultsWe show that feed-forward deep neural networks are capable of achieving strong classification performance and outperform shallow methods across diverse activity classes when optimized. Hyper-parameters that were found to play critical role are the activation function, dropout regularization, number hidden layers and number of neurons. When compared to the rest methods, tuned DNNs were found to statistically outperform, with p value <0.01 based on Wilcoxon statistical test. DNN achieved on average MCC units of 0.149 higher than NB, 0.092 than kNN, 0.052 than SVM with linear kernel, 0.021 than RF and finally 0.009 higher than SVM with radial basis function kernel. When exploring robustness to noise, non-linear methods were found to perform well when dealing with low levels of noise, lower than or equal to 20%, however when dealing with higher levels of noise, higher than 30%, the Naïve Bayes method was found to perform well and even outperform at the highest level of noise 50% more sophisticated methods across several datasets.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-017-0226-y) contains supplementary material, which is available to authorized users.

show abstract

Section: Methodsmentioning

confidence: 99%

Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data

et al. 2017

View full text Add to dashboard Cite

show abstract

“…The two major categories of vHTS approaches are molecular simulation (e.g., AutoDock, DOCK, Flex, AMBER, GROMACS, CHARMM), and ligand‐based scoring . Molecular simulation uses models and concepts from physics, chemistry, and mathematics to make optimized ligand/substrate interaction predictions.…”

Section: Introductionmentioning

confidence: 99%

“…The two major categories of vHTS approaches are molecular simulation (e.g., AutoDock, DOCK, Flex, AMBER, GRO-MACS, CHARMM), [41][42][43][44][45][46][47][48][49][50][51][52] and ligand-based scoring. 49,[53][54][55] Molecular simulation uses models and concepts from physics, chemistry, and mathematics to make optimized ligand/substrate interaction predictions. Although molecular simulations require minimal experimental data, it is computationally expensive and requires numerical solution convergence for confidence.…”

Section: Introductionmentioning

confidence: 99%

Identifying new clotting factor XIa inhibitors in virtual high‐throughput screens using PCA‐GA‐SVM models and signature

Chen

Schmucker

Visco

2018

Biotechnology Progress

View full text Add to dashboard Cite

Blood Clotting Factor XI is an important actor in the clotting mechanism: it activates downstream zymogen involved in the clotting process. It can be targeted for activation or inhibition depending on treatment goals to enhance or inhibit clotting. In terms of antithrombosis treatment, Factor XI has emerged as a promising target to focus on. In this work, an iterative virtual high-throughput screening pipeline was proposed that can supplement current efforts to find inhibitors. The first iteration identified 11 compounds to test with 3 active for a hit-rate of 27.3%. The second iteration of the pipeline identified another 11 compounds to test with 7 active for a hit-rate of 63.6%. © 2018 American Institute of Chemical Engineers Biotechnol. Prog., 2018.

show abstract

“…Often, the QSAR model algorithms come with free parameters that need to be determined, e.g., support vector machines based on the radial basis function has the free parameters and cost [47] and k-nearest neighbour has k [46]. A common way of determining actual values for parameters such as these is a grid search or “parameter sweep”.…”

Section: Methodsmentioning

confidence: 99%

Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles

2016

Self Cite

View full text Add to dashboard Cite

Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.Graphical abstract.

show abstract

Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines

Cited by 35 publications

References 39 publications

Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data

Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data

Identifying new clotting factor XIa inhibitors in virtual high‐throughput screens using PCA‐GA‐SVM models and signature

Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles

Contact Info

Product

Resources

About