In this article, we discuss the application of the Gaussian Process method for the prediction of absorption, distribution, metabolism, and excretion (ADME) properties. On the basis of a Bayesian probabilistic approach, the method is widely used in the field of machine learning but has rarely been applied in quantitative structure-activity relationship and ADME modeling. The method is suitable for modeling nonlinear relationships, does not require subjective determination of the model parameters, works for a large number of descriptors, and is inherently resistant to overtraining. The performance of Gaussian Processes compares well with and often exceeds that of artificial neural networks. Due to these features, the Gaussian Processes technique is eminently suitable for automatic model generation-one of the demands of modern drug discovery. Here, we describe the basic concept of the method in the context of regression problems and illustrate its application to the modeling of several ADME properties: blood-brain barrier, hERG inhibition, and aqueous solubility at pH 7.4. We also compare Gaussian Processes with other modeling techniques.
Aggregation is a common problem affecting biopharmaceutical development that can have a significant effect on the quality of the product, as well as the safety to patients, particularly because of the increased risk of immune reactions. Here, we describe a new high-throughput screening algorithm developed to classify antibody molecules based on their propensity to aggregate. The tool, constructed and validated on experimental aggregation data for over 500 antibodies, is able to discern molecules with a high aggregation propensity as defined by experimental criteria relevant to bioprocessing and manufacturing of these molecules. Furthermore, we show how this tool can be combined with other computational approaches during early drug development to select molecules with reduced risk of aggregation and optimal developability properties.
Prior to clinical development, a comprehensive pharmacokinetic characterization of a novel drug is required to understand its exposure at the site of action and elimination. Accordingly, in vitro assays and animal pharmacokinetic studies are regularly employed to predict drug exposure in humans, which is often costly and time-consuming. For this reason, the prediction of human pharmacokinetics at the point of design would be of high value for drug discovery. Therefore, we have established a comprehensive data curation protocol that enables machine learning evaluation of 12 human in vivo pharmacokinetic parameters using only chemical structure information and available doses for 1001 unique compounds. These machine learning models were thoroughly investigated and validated using both an independent hold-out test set and AstraZeneca clinical data. In addition, the availability of preclinical predictions for a subset of internal clinical candidates allowed us to compare our in silico approach with state-of-the-art pharmacokinetic predictions. Based on this evaluation, three fit-for-purpose models for AUC PO (R test 2 = 0.63; RMSEtest = 0.76), C max PO (R test 2 = 0.68; RMSEtest = 0.62), and Vdss IV (R test 2 = 0.47; RMSEtest = 0.50) were identified. Based on the findings, our machine learning models have considerable potential for practical applications in drug discovery, such as influencing decision-making in drug discovery projects and progression of drug candidates toward the clinic.
In drug development, the “onus” of the low R&D efficiency has been put traditionally onto the drug discovery process (i.e., finding the right target or “binding” functionality). Here, we show that manufacturing is not only a central component of product success, but also that, by integrating manufacturing and discovery activities in a “holistic” interpretation of QbD methodologies, we could expect to increase the efficiency of the drug discovery process as a whole. In this new context, early risk assessment, using developability methodologies and computational methods in particular, can assist in reducing risks during development in a cost-effective way. We define specific areas of risk and how they can impact product quality in a broad sense, including essential aspects such as product efficacy and patient safety. Emerging industry practices around developability are introduced, including some specific examples of applications to biotherapeutics. Furthermore, we suggest some potential workflows to illustrate how developability strategies can be introduced in practical terms during early drug development in order to mitigate risks, reduce drug attrition and ultimately increase the robustness of the biopharmaceutical supply chain. Finally, we also discuss how the implementation of such methodologies could accelerate the access of new therapeutic treatments to patients in the clinic.
In this article, we present an automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We describe the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We apply this automatic process to data sets of blood-brain barrier penetration and aqueous solubility and compare the resulting automatically generated models with 'manually' built models using external test sets. The results demonstrate the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.
ADMET Models, whether in silico or in vitro, are commonly used to 'profile' molecules, to identify potential liabilities or filter out molecules expected to have undesirable properties. While useful, this is the most basic application of such models. Here, we will show how models may be used to go 'beyond profiling' to guide key decisions in drug discovery. For example, selection of chemical series to focus resources with confidence or design of improved molecules targeting structural modifications to improve key properties. To prioritise molecules and chemical series, the success criteria for properties and their relative importance to a project's objective must be defined. Data from models (experimental or predicted) may then be used to assess each molecule's balance of properties against those requirements. However, to make decisions with confidence, the uncertainties in all of the data must also be considered. In silico models encode information regarding the relationship between molecular structure and properties. This is used to predict the property value of a novel molecule. However, further interpretation can yield information on the contributions of different groups in a molecule to the property and the sensitivity of the property to structural changes. Visualising this information can guide the redesign process. In this article, we describe methods to achieve these goals and drive drug-discovery decisions and illustrate the results with practical examples.
Animal pharmacokinetic (PK) data as well as human and animal in vitro systems are utilized in drug discovery to define the rate and route of drug elimination. Accurate prediction and mechanistic understanding of drug clearance and disposition in animals provide a degree of confidence for extrapolation to humans. In addition, prediction of in vivo properties can be used to improve design during drug discovery, help select compounds with better properties, and reduce the number of in vivo experiments. In this study, we generated machine learning models able to predict rat in vivo PK parameters and concentration–time PK profiles based on the molecular chemical structure and either measured or predicted in vitro parameters. The models were trained on internal in vivo rat PK data for over 3000 diverse compounds from multiple projects and therapeutic areas, and the predicted endpoints include clearance and oral bioavailability. We compared the performance of various traditional machine learning algorithms and deep learning approaches, including graph convolutional neural networks. The best models for PK parameters achieved R 2 = 0.63 [root mean squared error (RMSE) = 0.26] for clearance and R 2 = 0.55 (RMSE = 0.46) for bioavailability. The models provide a fast and cost-efficient way to guide the design of molecules with optimal PK profiles, to enable the prediction of virtual compounds at the point of design, and to drive prioritization of compounds for in vivo assays.
In this article, we extend the application of the Gaussian processes technique to classification quantitative structure-activity relationship modeling problems. We explore two approaches, an intrinsic Gaussian processes classification technique and a probit treatment of the Gaussian processes regression method. Here, we describe the basic concepts of the methods and apply these techniques to building category models of absorption, distribution, metabolism, excretion, toxicity and target activity data. We also compare the performance of Gaussian processes for classification to other known computational methods, namely decision trees, random forest, support vector machines, and probit partial least squares. The results indicate that, while no method consistently generates the best model, the Gaussian processes classifier often produces more predictive models than those of the random forest or support vector machines and was rarely significantly outperformed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.