Lori B. Pfahler scite author profile

We have developed an algorithm that clusters structural databases using topological similarity. The first step in this procedure is to identify a set of probe structures that all fall outside a defined similarity score cutoff with respect to one another. This list of probes is then used to bin the remaining compounds in the database. In the last step, some housekeeping is performed to ensure that each compound in the dataset is either a probe or is contained in one and only one bin. We have applied this clustering method to a database of ∼27 000 compounds for which we have screening level biological data. Analysis of the resulting clusters shows that clusters defined by an active probe are much more likely to contain other active compounds than clusters defined by an inactive probe. Indeed, the incidence of active compounds in bins with active probes is anywhere from 6 to 10 times greater than the incidence of active compounds in the database as a whole. This results demonstrates the power of simple two-dimensional topological descriptors, and serves to validate our clustering algorithm.

show abstract

Diversity and Coverage of Structural Sublibraries Selected Using the SAGE and SCA Algorithms

Reynolds

Tropsha

Pfahler

et al. 2001

J. Chem. Inf. Comput. Sci.

View full text Add to dashboard Cite

It is often impractical to synthesize and test all compounds in a large exhaustive chemical library. Herein, we discuss rational approaches to selecting representative subsets of virtual libraries that help direct experimental synthetic efforts for diverse library design. We compare the performance of two stochastic sampling algorithms, Simulating Annealing Guided Evaluation (SAGE; Zheng, W.; Cho, S. J.; Waller, C. L.; Tropsha, A. J. Chem. Inf. Comput. Sci. 1999, 39, 738-746.) and Stochastic Cluster Analysis (SCA; Reynolds, C. H.; Druker, R.; Pfahler, L. B. Lead Discovery Using Stochastic Cluster Analysis (SCA): A New Method for Clustering Structurally Similar Compounds J. Chem. Inf. Comput. Sci. 1998, 38, 305-312.) for their ability to select both diverse and representative subsets of the entire chemical library space. The SAGE and SCA algorithms were compared using u- and s-optimal metrics as an independent assessment of diversity and coverage. This comparison showed that both algorithms were capable of generating sublibraries in descriptor space that are diverse and give reasonable coverage (i.e. are representative) of the original full library. Tests were carried out using simulated two-dimensional data sets and a 27 000 compound proprietary structural library as represented by computed Molconn-Z descriptors. One of the key observations from this work is that the algorithmically simple SCA method is capable of selecting subsets that are comparable to the more computationally intensive SAGE method.

show abstract

Statistical Applications for Chemistry, Manufacturing and Controls (CMC) in the Pharmaceutical Industry

Burdick¹,

Pfahler²,

Zhang

et al. 2017

View full text Add to dashboard Cite

Chemical Information Based Scaling of Molecular Descriptors: A Universal Chemical Scale for Library Design and Analysis

Tounge

Pfahler

Reynolds

2002

J. Chem. Inf. Comput. Sci.

View full text Add to dashboard Cite

Scaling is a difficult issue for any analysis of chemical properties or molecular topology when disparate descriptors are involved. To compare properties across different data sets, a common scale must be defined. Using several publicly available databases (ACD, CMC, MDDR, and NCI) as a basis, we propose to define chemically meaningful scales for a number of molecular properties and topology descriptors. These chemically derived scaling functions have several advantages. First, it is possible to define chemically relevant scales, greatly simplifying similarity and diversity analyses across data sets. Second, this approach provides a convenient method for setting descriptor boundaries that define chemically reasonable topology spaces. For example, descriptors can be scaled so that compounds with little potential for biological activity, bioavailability, or other drug-like characteristics are easily identified as outliers. We have compiled scaling values for 314 molecular descriptors. In addition the 10th and 90th percentile values for each descriptor have been calculated for use in outlier filtering.

show abstract

Process Design: Stage 1 of the FDA Process Validation Guidance

Burdick¹,

LeBlond²,

Pfahler

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lori B. Pfahler

Lead Discovery Using Stochastic Cluster Analysis (SCA): A New Method for Clustering Structurally Similar Compounds

Diversity and Coverage of Structural Sublibraries Selected Using the SAGE and SCA Algorithms

Statistical Applications for Chemistry, Manufacturing and Controls (CMC) in the Pharmaceutical Industry

Chemical Information Based Scaling of Molecular Descriptors: A Universal Chemical Scale for Library Design and Analysis

Process Design: Stage 1 of the FDA Process Validation Guidance

Contact Info

Product

Resources

About