Human pharmacokinetics is of great significance in the selection of drug candidates, and in silico estimation of pharmacokinetic parameters in the early stage of drug development has become the trend of drug research owing to its time-and cost-saving advantages. Herein, quantitative structure−property relationship studies were carried out to predict four human pharmacokinetic parameters including volume of distribution at steady state (VD ss ), clearance (CL), terminal half-life (t 1/2 ), and fraction unbound in plasma (f u ), using a data set consisting of 1352 drugs. A series of regression models were built using the most suitable features selected by Boruta algorithm and four machine learning methods including support vector machine (SVM), random forest (RF), gradient boosting machine (GBM), and XGBoost (XGB). For VD ss , SVM showed the best performance with R 2 test = 0.870 and RMSE test = 0.208. For the other three pharmacokinetic parameters, the RF models produced the superior prediction accuracy (for CL, R 2 test = 0.875 and RMSE test = 0.103; for t 1/2 , R 2 test = 0.832 and RMSE test = 0.154; for f u , R 2 test = 0.818 and RMSE test = 0.291). Assessed by 10-fold cross validation, leave-one-out cross validation, Y-randomization test and applicability domain evaluation, these models demonstrated excellent stability and predictive ability. Compared with other published models for human pharmacokinetic parameters estimation, it was further confirmed that our models obtained better predictive ability and could be used in the selection of preclinical candidates.
Microarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes but only a few hundreds of samples or less. Such extreme asymmetry between the dimensionality of genes and samples can lead to inaccurate diagnosis of disease in clinic. Therefore, it has been shown that selecting a small set of marker genes can lead to improved classification accuracy. In this paper, a simple modified ant colony optimization (ACO) algorithm is proposed to select tumor-related marker genes, and support vector machine (SVM) is used as classifier to evaluate the performance of the extracted gene subset. Experimental results on several benchmark tumor microarray datasets showed that the proposed approach produces better recognition with fewer marker genes than many other methods. It has been demonstrated that the modified ACO is a useful tool for selecting marker genes and mining high dimension data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.