On Selection of Training and Test Sets for the Development of Predictive QSAR models

Leonard, J. T.; Roy, Kunal

doi:10.1002/qsar.200510161

Cited by 230 publications

(117 citation statements)

References 26 publications

Supporting

Mentioning

112

Contrasting

Unclassified

Order By: Relevance

“…Since the performance of a QSAR model depends on the numerical values of the NOELs used in training the model (Leonard and Roy, 2006), training of a QSAR model for assessing the toxicity of cosmetic ingredients should ideally be done with the aid of datasets of potential or actual cosmetic ingredients (e.g. chemicals from the International Nomenclature of Cosmetic Ingredients (INCI) list).…”

Section: Discussionmentioning

confidence: 99%

Development of QSAR models using artificial neural network analysis for risk assessment of repeated-dose, reproductive, and developmental toxicities of cosmetic ingredients

et al. 2015

View full text Add to dashboard Cite

-Use of laboratory animals for systemic toxicity testing is subject to strong ethical and regulatory constraints, but few alternatives are yet available. One possible approach to predict systemic toxicity of chemicals in the absence of experimental data is quantitative structure-activity relationship (QSAR) analysis. Here, we present QSAR models for prediction of maximum "no observed effect level" (NOEL) for repeated-dose, developmental and reproductive toxicities. NOEL values of 421 chemicals for repeated-dose toxicity, 315 for reproductive toxicity, and 156 for developmental toxicity were collected from Japan Existing Chemical Data Base (JECDB). Descriptors to predict toxicity were selected based on molecular orbital (MO) calculations, and QSAR models employing multiple independent descriptors as the input layer of an artificial neural network (ANN) were constructed to predict NOEL values. Robustness of the models was indicated by the root-mean-square (RMS) errors after 10-fold cross-validation (0.529 for repeated-dose, 0.508 for reproductive, and 0.558 for developmental toxicity). Evaluation of the models in terms of the percentages of predicted NOELs falling within factors of 2, 5 and 10 of the in-vivo-determined NOELs suggested that the model is applicable to both general chemicals and the subset of chemicals listed in International Nomenclature of Cosmetic Ingredients (INCI). Our results indicate that ANN models using in silico parameters have useful predictive performance, and should contribute to integrated risk assessment of systemic toxicity using a weight-of-evidence approach. Availability of predicted NOELs will allow calculation of the margin of safety, as recommended by the Scientific Committee on Consumer Safety (SCCS).

show abstract

Section: Discussionmentioning

confidence: 99%

Development of QSAR models using artificial neural network analysis for risk assessment of repeated-dose, reproductive, and developmental toxicities of cosmetic ingredients

et al. 2015

View full text Add to dashboard Cite

show abstract

“…So, the selection of the training set is significantly important in QSAR analysis. Predictive potential of a model on the new data set is influenced by the similarity of chemical nature between training set and test set [28][29][30]. The test set molecules will be predicted well when these molecules are very similar to the training set compounds.…”

Section: Cluster Analysis and Validationmentioning

confidence: 99%

Predictive QSAR modeling of CCR5 antagonist piperidine derivatives using chemometric tools

Roy

Mandal

2008

Journal of Enzyme Inhibition and Medicinal Chemistry

Self Cite

View full text Add to dashboard Cite

Quantitative structure-activity relationship (QSAR) studies have been performed on piperidine derivatives (n ¼ 119) as CCR5 antagonists. The whole data set was divided into a training set (75% of the dataset) and a test set (remaining 25%) on the basis of K-means clustering technique. Models developed from the training set were used to assess the predictive potential of the models using test set compounds. Initially classical type QSAR models were developed using structural, spatial, electronic, physicochemical and/or topological parameters using statistical methods like stepwise regression, partial least squares (PLS) and factor analysis followed by multiple linear regression (FA-MLR). Using topological and structural parameters, FA-MLR provided the best equation based on internal validation (Q 2 ¼ 0.514) but the best externally validated model was obtained with PLS (R 2 pred ¼ 0.565). When structural, physicochemical, spatial and electronic descriptors were used, the best Q 2 value (0.562) was obtained from the stepwise regression derived model whereas the best R 2 pred value (0.571) came from the PLS model. When topological descriptors were used in combination with the structural, physicochemical, spatial and electronic descriptors, the best Q 2 and R 2 pred values obtained were 0.530 (stepwise regression) and 0.580 (PLS) respectively. Attempt was made to develop 3D-QSAR models using molecular shape analysis descriptors in combination with structural, physicochemical, spatial and electronic parameters. Linear models were developed using genetic function algorithm coupled with multiple linear regression. However, the results from the 3D-QSAR study were not superior to those of the classical QSAR models. Finally, artificial neural network was employed for development of nonlinear models. The ANN models showed acceptable values of squared correlation coefficient for the observed and predicted values of the test set compounds. From the view point of external predictability, selected ANN models were superior to the linear QSAR models. All reported models satisfy the criteria of external validation as recommended by Golbraikh and Tropsha (J Mol Graphics Mod 2002; 20: 269 -276), whereas the majority of the models have modified r 2 (r 2 m ) value of the test set for external validation more than 0.5 as suggested by Roy and Roy (QSAR Comb Sci 2008; 27: 302-313).

show abstract

“…It has been indicated that to achieve the optimal model, the selection of training and test sets should be based on some rational algorithms; otherwise, poor predictive ability of QSAR models may be obtained [7]. Therefore, it is also an important step to select the group of molecules that represent the most critical structural and physicochemical features associated with activity.…”

Section: Introductionmentioning

confidence: 99%

3D-QSAR studies of triazolopyrimidine derivatives of Plasmodium falciparum dihydroorotate dehydrogenase inhibitors using a combination of molecular dynamics, docking, and genetic algorithm-based methods

Shah

Kumar

Tiwari³

et al. 2012

J Chem Biol

View full text Add to dashboard Cite

A series of 35 triazolopyrimidine analogues reported as Plasmodium falciparum dihydroorotate dehydrogenase (PfDHODH) inhibitors were optimized using quantum mechanics methods, and their binding conformations were studied by docking and 3D quantitative structure-activity relationship studies. Genetic algorithm-based criteria was adopted for selection of training and test sets while maintaining structural diversity of training and test sets, which is also very crucial for model development and validation. Both the comparative molecular field analyses (q 2 LOO ¼ 0:841, r 2 ncv ¼ 0:99) and comparative molecular similarity indices analyses (q 2 LOO ¼ 0:757, r 2 ncv ¼ 0:943) show excellent correlation and high predictive power. Furthermore, molecular dynamics simulations were performed to explore the binding mode of the two of the most active compounds of the series, 10 and 14. Harmonization in the two simulation results validate the analysis and therefore applicability of docking parameters based on crystallographic conformation of compound 14 bound to receptor molecule. This work provides useful information about the inhibition mechanism of this class of molecules and will assist in the design of more potent inhibitors of PfDHODH.

show abstract

On Selection of Training and Test Sets for the Development of Predictive QSAR models

Cited by 230 publications

References 26 publications

Development of QSAR models using artificial neural network analysis for risk assessment of repeated-dose, reproductive, and developmental toxicities of cosmetic ingredients

Development of QSAR models using artificial neural network analysis for risk assessment of repeated-dose, reproductive, and developmental toxicities of cosmetic ingredients

Predictive QSAR modeling of CCR5 antagonist piperidine derivatives using chemometric tools

3D-QSAR studies of triazolopyrimidine derivatives of Plasmodium falciparum dihydroorotate dehydrogenase inhibitors using a combination of molecular dynamics, docking, and genetic algorithm-based methods

Contact Info

Product

Resources

About