2003
DOI: 10.1021/ci034143r
|View full text |Cite
|
Sign up to set email alerts
|

Spline-Fitting with a Genetic Algorithm:  A Method for Developing Classification Structure−Activity Relationships

Abstract: Classification methods allow for the development of structure-activity relationship models when the target property is categorical rather than continuous. We describe a classification method which fits descriptor splines to activities, with descriptors selected using a genetic algorithm. This method, which we identify as SFGA, is compared to the well-established techniques of recursive partitioning (RP) and soft independent modeling by class analogy (SIMCA) using five series of compounds: cyclooxygenase-2 (COX… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
48
0

Year Published

2004
2004
2022
2022

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 159 publications
(52 citation statements)
references
References 35 publications
(60 reference statements)
1
48
0
Order By: Relevance
“…Many SVM, kNN or naive Bayes models were generated from the same number of compounds but with differing molecular descriptor set referred to as vary(MDes) in the figure of its efficiency and ease of use. In addition, the random method had shown to be effective in methods such as Random Forest and Random Decision Trees [50,80]. This process was repeated for 50 times, that is, there were 50 ensemble models built from each combination of 5 base classifiers, 9 base classifiers, etc.…”
Section: Modellingmentioning
confidence: 99%
See 1 more Smart Citation
“…Many SVM, kNN or naive Bayes models were generated from the same number of compounds but with differing molecular descriptor set referred to as vary(MDes) in the figure of its efficiency and ease of use. In addition, the random method had shown to be effective in methods such as Random Forest and Random Decision Trees [50,80]. This process was repeated for 50 times, that is, there were 50 ensemble models built from each combination of 5 base classifiers, 9 base classifiers, etc.…”
Section: Modellingmentioning
confidence: 99%
“…The method uses a fixed amount of training data on the basis that a model should learn from as many sample as possible to exploit all available information. To the best of our knowledge, there were eight other QSAR studies on ensemble of mixed features and mixed algorithms, applied on training set size of 48-816 compounds [33,[50][51][52][53][54][55][56]. The application of ensemble method had improved the final performances in a majority of these studies when compared with the best performing individual model [52,55].…”
Section: Introductionmentioning
confidence: 99%
“…[22] 4QSAR DHFR 362 0.529 regression 4 QSAR database [22] CPD MOUSE 442 0.152 regression ACD DSSTox databases [23] CPD RAT 580 0.158 regression ACD DSSTox databases [23] ISS MOUSE 316 0.153 regression Benigni/Vari Carcinogenicity datasets [24] ISS RAT 375 0.160 regression Benigni/Vari Carcinogenicity datasets [24] Suth COX2 414 0.493 regression Sutherland dataset [25] Suth DHFR 672 0.501 regression Sutherland dataset [25] Suth ER TOX 410 0.360 regression Sutherland dataset [25] FDAMDD 1216 0.270 regression ACD DSSTox databases [26] Fontaine 435 0.495 classification Fontaine et al [27] CYP INH 2C9 700 0.298 classification Yap and Chen [28] CYP SUB 2C9 700 0.298 classification Yap and Chen [28] NCI AIDS 1000 0.331 classification DTP AIDS Antiviral Screen [29] Values smaller than the black horizontal line indicate that more query molecules are predicted with local models than with the global model. As it can be seen from Figure 3, for the majority of datasets local models are more often applied when it comes to prediction.…”
Section: Experimental Setup and Overview Of Resultsmentioning
confidence: 99%
“…A detailed description of these two [46]. An alignment reference and feature map were created by applying conformation mining to three DHFR inhibitors with available crystal structures (PDB codes 1HFQ [47], 1S3U [48], and 1KMS [49]).…”
Section: Additional Validation Experiments: Dhfr and Thrombinmentioning
confidence: 99%