In this article, we discuss the application of the Gaussian Process method for the prediction of absorption, distribution, metabolism, and excretion (ADME) properties. On the basis of a Bayesian probabilistic approach, the method is widely used in the field of machine learning but has rarely been applied in quantitative structure-activity relationship and ADME modeling. The method is suitable for modeling nonlinear relationships, does not require subjective determination of the model parameters, works for a large number of descriptors, and is inherently resistant to overtraining. The performance of Gaussian Processes compares well with and often exceeds that of artificial neural networks. Due to these features, the Gaussian Processes technique is eminently suitable for automatic model generation-one of the demands of modern drug discovery. Here, we describe the basic concept of the method in the context of regression problems and illustrate its application to the modeling of several ADME properties: blood-brain barrier, hERG inhibition, and aqueous solubility at pH 7.4. We also compare Gaussian Processes with other modeling techniques.
We have previously used two general solvation equations to correlate and to interpret a wide variety of physicochemical and biochemical properties of compounds (solutes). Application of these equations requires a knowledge of the relevant solute descriptors, viz. R 2 , the excess molar refraction, π 2 H , the solute dipolarity/polarizability, ∑α H 2 , ∑β 2 H , the solute overall hydrogen bond acidity and basicity, and logL 16 , where L 16 is the solute gas-hexadecane partition coefficient at 298 K. We have also shown that these solute descriptors can be obtained from partition coefficients of solutes in various water-solvent and gas-solvent systems. Here, we use this approach to calculate solute descriptors for a series of 18 organofluorocarbons, classed as refrigerants, including chlorofluorocarbons, hydrochlorofluorocarbons, hydrofluorocarbons and perfluorocarbons, using Henry's law coefficients in water and five organic solvents that we have already measured. These data have been used to calculate Ostwald solubility coefficients, log L. Gas-water and gas-solvent partitions have been then combined to give log P for partition between water and solvent. A number of log P and L values have also been taken from the Medchem97 database. There are enough data to obtain the above descriptors for the 18 organofluorocarbons, and then to estimate log P and L values in a large number of other solvents. The chemosensory properties of the organofluorocarbons are also estimated.
Using 1-butanol and 2-heptanone as stimuli we measured detectability (i.e., psychometric) functions for the odor, nasal pungency, and eye irritation of these two substances alone and in binary mixtures. Nasal pungency responses were tested in subjects lacking olfaction (i.e., anosmics) for whom odors do not interfere. Eye irritation responses were tested in normosmics and anosmics and found to be similar in both groups so their results were pooled. When all stimuli -single and mixtures -were transformed into concentration units of one (or the other) chemical, a single function could fit all data from the same sensory endpoint with a correlation coefficient of 0.91 or higher. The outcome lends support, as a first approximation, to the notion of chemosensory agonism, in the sense of dose additivity, between the members of binary mixtures presented at perithreshold levels.
Odour detection thresholds, that we have previously obtained, have been analysed by a general equation for selective transport. It is shown that such selective transport can account for some 77% of the total effect. The remainder is due to a specific size effect, that might involve odour-binding proteins, and a specific effect for aldehydes and carboxylic acids. Our analysis raises the question of whether selective transport is physically separable from the specific effects of receptor activation. The model predicts a chemical cut-off in odour detection along any homologous series.
The success of any drug will depend on how closely it achieves an ideal combination of potency, selectivity, pharmacokinetics and safety. The key to achieving this success efficiently is to consider the overall balance of molecular properties of compounds against the ideal profile for the therapeutic indication from the earliest stages of a drug discovery project. The use of in silico predictive models of absorption, distribution, metabolism and elimination (ADME) and physicochemical properties is a major aid in this exercise, as it enables virtual molecules to be assessed across a broad range of properties from initial library generation, through to candidate selection. Of course, no measurement, whether in silico, in vitro or in vivo, is perfect and the uncertainties in any data should be explicitly taken into account when basing conclusions on test results. In addition, in the early stages of drug discovery, when designing a library that is lead seeking or building compound structure-activity relationships, the quality of any set of molecules should also be balanced against the chemical diversity covered. Here, a scheme is presented for achieving these goals based on a suite of predictive ADME models, probabilistic scoring and multiobjective optimisation for library design. The use of this platform for applications in lead identification and optimisation is illustrated.
In this article, we present an automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We describe the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We apply this automatic process to data sets of blood-brain barrier penetration and aqueous solubility and compare the resulting automatically generated models with 'manually' built models using external test sets. The results demonstrate the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.