Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters

Rácz, Anita; Bajusz, Dávid; Héberger, Károly

doi:10.1080/1062936x.2015.1084647

Cited by 91 publications

(52 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…ref. [62]. It is clear that in this case in silico methods are close to the recommended logK OW (exp) values, while chromatographic estimations might seem to perform worse.…”

Section: Comparison Of Lipophilicity Measures By Means Of Srd and Gpcmmentioning

confidence: 65%

“…Independently from this, our recent paper clearly shows that the ordering of merits for external validation is indistinguishable from random ranking. [62]. Nevertheless we have carried out the SRD and the GPCM ranking of lipophilicity measures on a subset of compounds with logK OW values that are likely to be correctly measured with the shake-flask method (logK OW < 3 and determined with the shake-flask procedure which was verified through a meticulous tracing of the original articles, Table S1, Supplementary material).…”

Section: Comparison Of Lipophilicity Measures By Means Of Srd and Gpcmmentioning

confidence: 99%

See 1 more Smart Citation

Multivariate assessment of lipophilicity scales—computational and reversed phase thin-layer chromatographic indices

Andrić

Bajusz

Rácz

et al. 2016

Journal of Pharmaceutical and Biomedical Analysis

View full text Add to dashboard Cite

Needs for fast, yet reliable means of assessing the lipophilicities of diverse compounds resulted in the development of various in silico and chromatographic approaches that are faster, cheaper, and greener compared to the traditional shake-flask method. However, at present no accepted "standard" approach exists for their comparison and selection of the most appropriate one(s). This is of utmost importance when it comes to the development of new lipophilicity indices, or the assessment of the lipophilicity of newly synthesized compounds. In this study, 50 well-known, diverse compounds of significant pharmaceutical and environmental importance have been selected and examined. Octanol-water partition coefficients have been measured with the shake-flask method for most of them. Their retentions have been studied in typical reversed thin-layer chromatographic systems, involving the most frequently employed stationary phases (octadecyl-and cyano-modified silica), and acetonitrile and methanol as mobile phase constituents. Twelve computationally estimated logP-s and twenty chromatographic indices together with the shake-flask octanol-water partition coefficient have been investigated with classical chemometric approaches -such as principal component analysis (PCA), hierarchical cluster analysis (HCA), Pearson's and Spearman's correlation matrices, as well as novel non-parametric methods: sum of ranking differences (SRD) and generalized pairwise correlation method (GPCM). Novel SRD and GPCM methods have been introduced based on the Comparisons with One VAriable (lipophilicity metric) at a Time (COVAT). For the visualization of COVAT results, a heatmap format was introduced. Analysis of variance (ANOVA) was applied to reveal the dominant factors between computational logPs and various chromatographic measures. In consensusbased comparisons, the shake-flask method performed the best, closely followed by computational estimates, while the chromatographic estimates often overlap with in silico assessments, mostly with methods involving octadecylmodified silica stationary phases. The ones that employ cyano-modified silica perform generally worse. The introduction of alternative coloring schemes for the covariance matrices and SRD/GPCM heatmaps enables the discovery of intrinsic relationships among lipophilicity scales and the selection of best/worst measures. Closest to the recommended logK OW values are ClogP and the first principal component scores obtained on octadecyl-silica stationary phase in combination with methanol-water mobile phase, while the usage of slopes derived from Soczewinski-Matyisik equation should be avoided.

show abstract

“…ref. [62]. It is clear that in this case in silico methods are close to the recommended logK OW (exp) values, while chromatographic estimations might seem to perform worse.…”

Section: Comparison Of Lipophilicity Measures By Means Of Srd and Gpcmmentioning

confidence: 65%

Section: Comparison Of Lipophilicity Measures By Means Of Srd and Gpcmmentioning

confidence: 99%

Multivariate assessment of lipophilicity scales—computational and reversed phase thin-layer chromatographic indices

Andrić

Bajusz

Rácz

et al. 2016

Journal of Pharmaceutical and Biomedical Analysis

View full text Add to dashboard Cite

show abstract

“…The values are presented together with computationally estimated logK OC -s in the In order to identify the best and the worst logK OC determination method non-parametric comparison by the SRD was applied on the entire set of logK OC values. The SRD method has been already successfully employed to rank and group variables, finding statistically significant differences even if the variables are highly correlated [40][41][42][43][44][45], which is the case with the present set of logK OC values.…”

Section: Determination Of Logk Oc Values and Comparison Of Chromatogrmentioning

confidence: 90%

Linear modeling of the soil-water partition coefficient normalized to organic carbon content by reversed-phase thin-layer chromatography

Andrić

Šegan

Dramićanin³

et al. 2016

Journal of Chromatography A

View full text Add to dashboard Cite

Soil-water partition coefficient normalized to the organic carbon content (KOC) is one of the crucial properties influencing the fate of organic compounds in the environment. Chromatographic methods are well established alternative for direct sorption techniques used for KOC determination. The present work proposes reversed-phase thin-layer chromatography (RP-TLC) as a simpler, yet equally accurate method as officially recommended HPLC technique. Several TLC systems were studied including octadecyl-(RP18) and cyano-(CN) modified silica layers in combination with methanol-water and acetonitrile-water mixtures as mobile phases. In total 50 compounds of different molecular shape, size, and various ability to establish specific interactions were selected (phenols, beznodiazepines, triazine herbicides, and polyaromatic hydrocarbons). Calibration set of 29 compounds with known logKOC values determined by sorption experiments was used to build simple univariate calibrations, Principal Component Regression (PCR) and Partial Least Squares (PLS) models between logKOC and TLC retention parameters. Models exhibit good statistical performance, indicating that CN-layers contribute better to logKOC modeling than RP18-silica. The most promising TLC methods, officially recommended HPLC method, and four in silico estimation approaches have been compared by non-parametric Sum of Ranking Differences approach (SRD). The best estimations of logKOC values were achieved by simple univariate calibration of TLC retention data involving CN-silica layers and moderate content of methanol (40-50%v/v). They were ranked far well compared to the officially recommended HPLC method which was ranked in the middle. The worst estimates have been obtained from in silico computations based on octanol-water partition coefficient. Linear Solvation Energy Relationship study revealed that increased polarity of CN-layers over RP18 in combination with methanol-water mixtures is the key to better modeling of logKOC through significant diminishing of dipolar and proton accepting influence of the mobile phase as well as enhancing molar refractivity in excess of the chromatographic systems.

show abstract

“…Here, N is the number of compounds of the training set, R 2 is the coefficient of determination, R 2 adj is adjusted R 2 , s is standard error of estimate, F is variance ratio, LOF is Friedman lack of fit 41,42 , Kxx is the correlation among descriptors 38 , Delta K is the difference of the correlation between the descriptors (Kx) and the descriptors plus the responses (Kxy), RMSE tr is Root Mean Square Error in fitting (for training set), MAE tr is Mean Absolute Error in fitting (calculated on training set), RSS tr is Residual Sum of Squares in fitting (also for training set) and CCC tr is the concordance correlation coefficient calculated over the training set 43,44,45 . The model projects an R 2 value is of 0.8737, which means a proper fitness for modelling Syk protein inhibition.…”

Section: Qsar Model Construction and Validationmentioning

confidence: 99%

Untitled

2018

IJPSR

View full text Add to dashboard Cite

Spleen tyrosine kinase (Syk) is a member of tyrosine kinase family protein. Syk protein plays a vital role during intracellular signal transduction from high affinity IgE receptor (F cε RI) in allergic reaction. Flavonoids are well known compounds for their anti-allergic properties. In this present work, thirty-four structurally similar flavonoids are investigated as Syk inhibitors by using 3-dimensional quantitative structure-activity relationship (QSAR) models and molecular docking studies. By applying genetic algorithm ( Pharmacokinetics study of chrysin shows that the gastrointestinal absorption of chrysin is high with bioavailability score of 0.55. The current work will help in drug design of human Syk inhibitors and provides information for molecular level of interactions between Syk and the flavonoid group of compounds.

show abstract

Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters

Cited by 91 publications

References 41 publications

Multivariate assessment of lipophilicity scales—computational and reversed phase thin-layer chromatographic indices

Multivariate assessment of lipophilicity scales—computational and reversed phase thin-layer chromatographic indices

Linear modeling of the soil-water partition coefficient normalized to organic carbon content by reversed-phase thin-layer chromatography

Untitled

Contact Info

Product

Resources

About