Machine learning methods have been applied to many datasets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of endpoints relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying datasets that is applicable to pharmaceutical research. Endpoints relevant to pharmaceutical research include absorption, distribution, metabolism, excretion and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery datasets. In this study, we have used datasets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis and malaria to compare different machine learning methods using FCFP6 fingerprints. These datasets represent whole cell screens, individual proteins, physicochemical properties as well as a dataset with a complex endpoint. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohen’s kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or datasets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further using multiple metrics with much larger scale comparisons, prospective testing as well as assessment of different fingerprints and DNN architectures beyond those used.
Tuberculosis is a global health dilemma. In 2016, the WHO reported 10.4 million incidences and 1.7 million deaths. The need to develop new treatments for those infected with Mycobacterium tuberculosis ( Mtb) has led to many large-scale phenotypic screens and many thousands of new active compounds identified in vitro. However, with limited funding, efforts to discover new active molecules against Mtb needs to be more efficient. Several computational machine learning approaches have been shown to have good enrichment and hit rates. We have curated small molecule Mtb data and developed new models with a total of 18,886 molecules with activity cutoffs of 10 μM, 1 μM, and 100 nM. These data sets were used to evaluate different machine learning methods (including deep learning) and metrics and to generate predictions for additional molecules published in 2017. One Mtb model, a combined in vitro and in vivo data Bayesian model at a 100 nM activity yielded the following metrics for 5-fold cross validation: accuracy = 0.88, precision = 0.22, recall = 0.91, specificity = 0.88, kappa = 0.31, and MCC = 0.41. We have also curated an evaluation set ( n = 153 compounds) published in 2017, and when used to test our model, it showed the comparable statistics (accuracy = 0.83, precision = 0.27, recall = 1.00, specificity = 0.81, kappa = 0.36, and MCC = 0.47). We have also compared these models with additional machine learning algorithms showing Bayesian machine learning models constructed with literature Mtb data generated by different laboratories generally were equivalent to or outperformed deep neural networks with external test sets. Finally, we have also compared our training and test sets to show they were suitably diverse and different in order to represent useful evaluation sets. Such Mtb machine learning models could help prioritize compounds for testing in vitro and in vivo.
Raman spectroscopy (RS) has been used as a technique for the characterization of well-aligned IrO
Background The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction. Methods The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure–activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN). Results The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R2) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products. Conclusions This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.
The purpose of this work is to study the antimetastasis activity of gadolinium metallofullerenol nanoparticles (f-NPs) in malignant and invasive human breast cancer models. We demonstrated that f-NPs inhibited the production of matrix metalloproteinase (MMP) enzymes and further interfered with the invasiveness of cancer cells in tissue culture condition. In the tissue invasion animal model, the invasive primary tumor treated with f-NPs showed significantly less metastasis to the ectopic site along with the decreased MMP expression. In the same animal model, we observed the formation of a fibrous cage that may serve as a physical barrier capable of cancer tissue encapsulation that cuts the communication between cancer- and tumor-associated macrophages, which produce MMP enzymes. In another animal model, the blood transfer model, f-NPs potently suppressed the establishment of tumor foci in lung. Based on these data, we conclude that f-NPs have antimetastasis effects and speculate that utilization of f-NPs may provide a new strategy for the treatment of tumor metastasis.
Overexpression of cyclooxygenase‐2 (COX‐2) in oral mucosa has been associated with increased risk of head and neck squamous cell carcinoma (HNSCC). Celecoxib is a nonsteroidal anti‐inflammatory drug, which inhibits COX‐2 but not COX‐1. This selective COX‐2 inhibitor holds promise as a cancer preventive agent. Concerns about cardiotoxicity of celecoxib, limits its use in long‐term chemoprevention and therapy. Salvianolic acid B (Sal‐B) is a leading bioactive component of Salvia miltiorrhiza Bge, which is used for treating neoplastic and chronic inflammatory diseases in China. The purpose of this study was to investigate the mechanisms by which Sal‐B inhibits HNSCC growth. Sal‐B was isolated from S. miltiorrhiza Bge by solvent extraction followed by 2 chromatographic steps. Pharmacological activity of Sal‐B was assessed in HNSCC and other cell lines by estimating COX‐2 expression, cell viability and caspase‐dependent apoptosis. Sal‐B inhibited growth of HNSCC JHU‐022 and JHU‐013 cells with IC50 of 18 and 50 μM, respectively. Nude mice with HNSCC solid tumor xenografts were treated with Sal‐B (80 mg/kg/day) or celecoxib (5 mg/kg/day) for 25 days to investigate in vivo effects of the COX‐2 inhibitors. Tumor volumes in Sal‐B treated group were significantly lower than those in celecoxib treated or untreated control groups (p < 0.05). Sal‐B inhibited COX‐2 expression in cultured HNSCC cells and in HNSCC cells isolated from tumor xenografts. Sal‐B also caused dose‐dependent inhibition of prostaglandin E2 synthesis, either with or without lipopolysaccharide stimulation. Taken together, Sal‐B shows promise as a COX‐2 targeted anticancer agent for HNSCC prevention and treatment. © 2008 Wiley‐Liss, Inc.
Nowadays the development of new functional materials/chemical compounds using machine learning (ML) techniques is a hot topic and includes several crucial steps, one of which is the choice of chemical structure representation. The classical approach of rigorous feature engineering in ML typically improves the performance of the predictive model, but at the same time, it narrows down the scope of applicability and decreases the physical interpretability of predicted results. In this study, we present graph convolutional neural networks (GCNNs) as an architecture that allows for successfully predicting the properties of compounds from diverse domains of chemical space, using a minimal set of meaningful descriptors. The applicability of GCNN models has been demonstrated by a wide range of chemical domain-specific properties. Their performance is comparable to state-of-the-art techniques; however, this architecture exempts from the need to carry out precise feature engineering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.