Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents

Yang, Xue; Li, Zerong; Yap, Chun Wei; Sun, Lin; Chen, Xin; Chen, Yu Zong

doi:10.1021/ci049869h

Cited by 153 publications

(187 citation statements)

References 75 publications

(189 reference statements)

Supporting

Mentioning

182

Contrasting

Order By: Relevance

“…The choice of 50% was arbitrary although it has also been used in previous studies (Niwa 2003). There have been a number of different cutoffs used, from 10% (Palm et al 1997) up to 70% (Xue et al 2004), with no standard defined. Table 4, referring to models built from training set TS1, shows that for the classification of the validation set the best overall classification accuracy was 0.958 (481/502), the highest specificity value was 0.952 (441/460) and the best sensitivity was 0.959 (40/42), all using model 3.…”

Section: Classification Analysismentioning

confidence: 99%

The impact of training set data distributions for modelling of passive intestinal absorption

Ghafourian

Freitas

Newby

2012

International Journal of Pharmaceutics

View full text Add to dashboard Cite

9This study presents regression and classification models to predict human intestinal 10 absorption of 645 drug and drug like compounds using percentage human intestinal values 11 from the published dataset by Hou et al (2007). The problem with this dataset and other 12 datasets in the literature is there are more highly than poorly absorbed compounds. Any 13 models developed using these datasets will be biased towards highly absorbed compounds

show abstract

Section: Classification Analysismentioning

confidence: 99%

The impact of training set data distributions for modelling of passive intestinal absorption

Ghafourian

Freitas

Newby

2012

International Journal of Pharmaceutics

View full text Add to dashboard Cite

show abstract

“…Area under ROC curve was found to be 0.967, whereas Youden's index was calculated as 0.84. Quite a few researchers have been tried to generate absorption models using different machine learning approaches and reported good results [6,24]. This is the first time we are presenting a comparative study between three potential machine learning approaches viz.…”

Section: Resultsmentioning

confidence: 95%

Classification of oral bioavailability of drugs by machine learning approaches: a comparative study.

Kumar¹,

Sharma²,

Varadwaj³

et al. 2012

JCIS

View full text Add to dashboard Cite

Oral Bioavailability is the rate and extent to which an active drug substance is absorbed and becomes available to the general circulation. A computational model for the prediction of oral bioavailability is a vital initial step in the drug discovery. It is decisive for selecting the promising compounds for the next level optimizations and recognition for the clinical trials. In the present investigation we aimed to perform the oral bioavailability prediction by comparing three machine learning methods i.e. Support Vector Machine (SVM) based kernel learning, Artificial Neural Network (ANN) and Bayesian classification approach. The overall prediction efficiency of SVM based model for the test set was 96.85%, whereas according to the Bayesian classifier and ANN methods prediction efficiency was found to be 92.19% and 94.53% respectively. Thus the present results clearly suggested that the SVM based prediction of oral bioavailability of drugs is more efficient binary classification approach for the data under consideration.

show abstract

“…59 The problem of selecting properties which are responsible for given outputs occurs in various machine learning applications. [60][61][62] We use feature selection methods with the objective to detect features that are responsible for the underlying class structure. In addition, we search for feature combinations that reflect or even outperform results using all features.…”

Section: Machine Learning Techniquesmentioning

confidence: 99%

Classification of Highly Unbalanced CYP450 Data of Drugs Using Cost Sensitive Machine Learning Techniques.

Eitrich¹,

Kless²,

Druska³

et al. 2007

ChemInform

View full text Add to dashboard Cite

In this paper, we study the classifications of unbalanced data sets of drugs. As an example we chose a data set of 2D6 inhibitors of cytochrome P450. The human cytochrome P450 2D6 isoform plays a key role in the metabolism of many drugs in the preclinical drug discovery process. We have collected a data set from annotated public data and calculated physicochemical properties with chemoinformatics methods. On top of this data, we have built classifiers based on machine learning methods. Data sets with different class distributions lead to the effect that conventional machine learning methods are biased toward the larger class. To overcome this problem and to obtain sensitive but also accurate classifiers we combine machine learning and feature selection methods with techniques addressing the problem of unbalanced classification, such as oversampling and threshold moving. We have used our own implementation of a support vector machine algorithm as well as the maximum entropy method. Our feature selection is based on the unsupervised McCabe method. The classification results from our test set are compared structurally with compounds from the training set. We show that the applied algorithms enable the effective high throughput in silico classification of potential drug candidates.

show abstract

Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents

Cited by 153 publications

References 75 publications

The impact of training set data distributions for modelling of passive intestinal absorption

The impact of training set data distributions for modelling of passive intestinal absorption

Classification of oral bioavailability of drugs by machine learning approaches: a comparative study.

Classification of Highly Unbalanced CYP450 Data of Drugs Using Cost Sensitive Machine Learning Techniques.

Contact Info

Product

Resources

About