Spline-Fitting with a Genetic Algorithm:  A Method for Developing Classification Structure−Activity Relationships

Sutherland, Jeffrey J.; O’Brien, Lee; Weaver, Donald F.

doi:10.1021/ci034143r

Cited by 159 publications

(52 citation statements)

References 35 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many SVM, kNN or naive Bayes models were generated from the same number of compounds but with differing molecular descriptor set referred to as vary(MDes) in the figure of its efficiency and ease of use. In addition, the random method had shown to be effective in methods such as Random Forest and Random Decision Trees [50,80]. This process was repeated for 50 times, that is, there were 50 ensemble models built from each combination of 5 base classifiers, 9 base classifiers, etc.…”

Section: Modellingmentioning

confidence: 99%

“…The method uses a fixed amount of training data on the basis that a model should learn from as many sample as possible to exploit all available information. To the best of our knowledge, there were eight other QSAR studies on ensemble of mixed features and mixed algorithms, applied on training set size of 48-816 compounds [33,[50][51][52][53][54][55][56]. The application of ensemble method had improved the final performances in a majority of these studies when compared with the best performing individual model [52,55].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mixed learning algorithms and features ensemble in hepatotoxicity prediction

Liew

Lim

Yap

2011

J Comput Aided Mol Des

101

View full text Add to dashboard Cite

Drug-induced liver injury, although infrequent, is an important safety concern that can lead to fatality in patients and failure in drug developments. In this study, we have used an ensemble of mixed learning algorithms and mixed features for the development of a model to predict hepatic effects. This robust method is based on the premise that no single learning algorithm is optimum for all modelling problems. An ensemble model of 617 base classifiers was built from a diverse set of 1,087 compounds. The ensemble model was validated internally with five-fold cross-validation and 25 rounds of y-randomization. In the external validation of 120 compounds, the ensemble model had achieved an accuracy of 75.0%, sensitivity of 81.9% and specificity of 64.6%. The model was also able to identify 22 of 23 withdrawn drugs or drugs with black box warning against hepatotoxicity. Dronedarone which is associated with severe liver injuries, announced in a recent FDA drug safety communication, was predicted as hepatotoxic by the ensemble model. It was found that the ensemble model was capable of classifying positive compounds (with hepatic effects) well, but less so on negatives compounds when they were structurally similar. The ensemble model built in this study is made available for public use.

show abstract

Section: Modellingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Mixed learning algorithms and features ensemble in hepatotoxicity prediction

Liew

Lim

Yap

2011

J Comput Aided Mol Des

101

View full text Add to dashboard Cite

show abstract

“…[22] 4QSAR DHFR 362 0.529 regression 4 QSAR database [22] CPD MOUSE 442 0.152 regression ACD DSSTox databases [23] CPD RAT 580 0.158 regression ACD DSSTox databases [23] ISS MOUSE 316 0.153 regression Benigni/Vari Carcinogenicity datasets [24] ISS RAT 375 0.160 regression Benigni/Vari Carcinogenicity datasets [24] Suth COX2 414 0.493 regression Sutherland dataset [25] Suth DHFR 672 0.501 regression Sutherland dataset [25] Suth ER TOX 410 0.360 regression Sutherland dataset [25] FDAMDD 1216 0.270 regression ACD DSSTox databases [26] Fontaine 435 0.495 classification Fontaine et al [27] CYP INH 2C9 700 0.298 classification Yap and Chen [28] CYP SUB 2C9 700 0.298 classification Yap and Chen [28] NCI AIDS 1000 0.331 classification DTP AIDS Antiviral Screen [29] Values smaller than the black horizontal line indicate that more query molecules are predicted with local models than with the global model. As it can be seen from Figure 3, for the majority of datasets local models are more often applied when it comes to prediction.…”

Section: Experimental Setup and Overview Of Resultsmentioning

confidence: 99%

Using Local Models to Improve (Q)SAR Predictivity

Buchwald

Girschick

Seeland

et al. 2011

Molecular Informatics

View full text Add to dashboard Cite

We present a novel (Q)SAR approach that detects groups of structures for local (Q)SAR modeling. The algorithm combines clustering and classification or regression for making predictions on chemical structure data. A clustering procedure producing clusters with shared structural scaffolds is applied as a preprocessing step, before a (local) model is learned for each relevant cluster. Instead of using only one global model (classical approach), we use weighted local models for predictions of query compounds dependent on cluster memberships. The approach is evaluated and compared against standard statistical (Q)SAR algorithms on various datasets. The results show that in many cases the application of local models significantly improves the predictive power of the derived (Q)SAR models compared to the classical approach, to models that are induced by a fingerprint-based or a hierarchical clustering approach and to locally weighted learning.

show abstract

“…A detailed description of these two [46]. An alignment reference and feature map were created by applying conformation mining to three DHFR inhibitors with available crystal structures (PDB codes 1HFQ [47], 1S3U [48], and 1KMS [49]).…”

Section: Additional Validation Experiments: Dhfr and Thrombinmentioning

confidence: 99%

Feature-map vectors: a new class of informative descriptors for computational drug discovery

Landrum

Penzotti

Putta

2007

J Comput Aided Mol Des

View full text Add to dashboard Cite

In order to develop robust machine-learning or statistical models for predicting biological activity, descriptors that capture the essence of the protein-ligand interaction are required. In the absence of structural information from X-ray or NMR experiments, deriving informative descriptors can be difficult. We have developed feature-map vectors (FMVs), a new class of descriptors based on chemical features, to address this challenge. FMVs, which are derived from the conformational models of a few actives, are low dimensional, problem specific, and highly interpretable. By using shape-based alignments and scoring with chemical features, FMVs can combine information about a molecule's shape and the pharmacophores it can match. In five validation studies, bag classifiers built using FMVs have shown high enrichments for identifying actives for five diverse targets: CDK2, 5-HT(3), DHFR, thrombin, and ACE. The interpretability of these descriptors has been demonstrated for CDK2 and 5-HT(3), where the method automatically discovers the standard literature pharmacophore.

show abstract

Spline-Fitting with a Genetic Algorithm: A Method for Developing Classification Structure−Activity Relationships

Cited by 159 publications

References 35 publications

Mixed learning algorithms and features ensemble in hepatotoxicity prediction

Mixed learning algorithms and features ensemble in hepatotoxicity prediction

Using Local Models to Improve (Q)SAR Predictivity

Feature-map vectors: a new class of informative descriptors for computational drug discovery

Contact Info

Product

Resources

About