Literature data on compounds both well- and poorly-absorbed in humans were used to build a statistical pattern recognition model of passive intestinal absorption. Robust outlier detection was utilized to analyze the well-absorbed compounds, some of which were intermingled with the poorly-absorbed compounds in the model space. Outliers were identified as being actively transported. The descriptors chosen for inclusion in the model were PSA and AlogP98, based on consideration of the physical processes involved in membrane permeability and the interrelationships and redundancies between available descriptors. These descriptors are quite straightforward for a medicinal chemist to interpret, enhancing the utility of the model. Molecular weight, while often used in passive absorption models, was shown to be superfluous, as it is already a component of both PSA and AlogP98. Extensive validation of the model on hundreds of known orally delivered drugs, "drug-like" molecules, and Pharmacopeia, Inc. compounds, which had been assayed for Caco-2 cell permeability, demonstrated a good rate of successful predictions (74-92%, depending on the dataset and exact criterion used).
Blockade of the hERG potassium channel prolongs the ventricular action potential (AP) and QT interval, and triggers early after depolarizations (EADs) and torsade de pointes (TdP) arrhythmia. Opinions differ as to the causal relationship between hERG blockade and TdP, the relative weighting of other contributing factors, definitive metrics of preclinical proarrhythmicity, and the true safety margin in humans. Here, we have used in silico techniques to characterize the effects of channel gating and binding kinetics on hERG occupancy, and of blockade on the human ventricular AP. Gating effects differ for compounds that are sterically compatible with closed channels (becoming trapped in deactivated channels) versus those that are incompatible with the closed/closing state, and expelled during deactivation. Occupancies of trappable blockers build to equilibrium levels, whereas those of non-trappable blockers build and decay during each AP cycle. Occupancies of ~83% (non-trappable) versus ~63% (trappable) of open/inactive channels caused EADs in our AP simulations. Overall, we conclude that hERG occupancy at therapeutic exposure levels may be tolerated for nontrappable, but not trappable blockers capable of building to the proarrhythmic occupancy level. Furthermore, the widely used Redfern safety index may be biased toward trappable blockers, overestimating the exposure-IC50 separation in nontrappable cases.
Docking and scoring is currently one of the tools used for hit finding and hit-to-lead optimization when structural information about the target is known. Docking scores have been found useful for optimizing ligand binding to reproduce experimentally observed binding modes. The question is, can docking and scoring be used reliably for hit-to-lead optimization? To illustrate the challenges of scoring for hit-to-lead optimization, the relationship of docking scores with experimentally determined IC(50) values measured in-house were tested. The influences of the particular target, crystal structure, and the precision of the scoring function on the ability to differentiate between actives and inactives were analyzed by calculating the area under the curve of receiver operator characteristic curves for docking scores. It was found that for the test sets considered, MW and sometimes ClogP were as useful as GlideScores and no significant difference was observed between SP and XP scores for differentiating between actives and inactives. Interpretation by an expert is still required to successfully utilize docking and scoring in hit-to-lead optimization.
The unreliability of multivariate outlier detection techniques such as Mahalanobis distance and hat matrix leverage has been known in the statistical community for well over a decade. However, only within the past few years has a serious effort been made to introduce robust methods for the detection of multivariate outliers into the chemical literature. Techniques such as the minimum volume ellipsoid (MVE), multivariate trimming (MVT), and M-estimators (e.g., PROP), and others similar to them, such as the minimum covariance determinant (MCD), rely upon algorithms that are difficult to program and may require significant processing times. While MCD and MVE have been shown to be statistically sound, we found MVT unreliable due to the method's use of the Mahalanobis distance measure in its initial step. We examined the performance of MCD and MVT on selected data sets and in simulations and compared the results with two methods of our own devising. Both the proposed resampling by the half-means method and the smallest half-volume method are simple to use, are conceptually clear, and provide results superior to MVT and the current best-performing technique, MCD. Either proposed method is recommended for the detection of multiple outliers in multivariate data.
Copy toner samples were analyzed using reflection-absorption infrared microscopy (R-A IR). The grouping of copy toners into distinguishable classes achieved by visual comparison and computer-assisted spectral matching was compared to that achieved by multivariate discriminant analysis. For a data set containing spectra of 430 copy toners, 90% (388/430) of the spectra were initially correctly grouped into the classifications previously established by spectral matching. Three groups of samples that did not classify well contained too few samples to allow reliable classification. Samples from two other pairs of groups were similar and often misclassified. Closer examination of spectra from these groups revealed discriminating features that could be used in separate discriminant analyses to improve classification. For one pair of groups, the classification accuracy improved to 91% (81/89) and 97% (28/29), for the two groups, respectively. The other pair of groups were completely distinguishable from one another. With these additional tests, multivariate discriminant analysis correctly classified 96% of the 430 R-A IR toner spectra into the toner groups found previously by spectral matching.
Very large data sets of molecules screened against a broad range of targets have become available due to the advent of combinatorial chemistry. This information has led to the realization that ADME (absorption, distribution, metabolism, and excretion) and toxicity issues are important to consider prior to library synthesis. Furthermore, these large data sets provide a unique and important source of information regarding what types of molecular shapes may interact with specific receptor or target classes. Thus, the requirement for rapid and accurate data mining tools became paramount. To address these issues Pharmacopeia, Inc. formed a computational research group, The Center for Informatics and Drug Discovery (CIDD).* In this review we cover the work done by this group to address both in silico ADME modeling and data mining issues faced by Pharmacopeia because of the availability of a large and diverse collection (over 6 million discrete compounds) of drug-like molecules. In particular, in the data mining arena we discuss rapid docking tools and how we employ them, and we describe a novel data mining tool based on a ID representation of a molecule followed by a molecular sequence alignment step. For the ADME area we discuss the development and application of absorption, blood-brain barrier (BBB) and solubility models. Finally, we summarize the impact the tools and approaches might have on the drug discovery process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.