Timothy B. Dunn scite author profile

The quantification of chemical diversity has many applications in drug discovery, organic chemistry, food, and natural product chemistry, to name a few. As the size of the chemical space is expanding rapidly, it is imperative to develop efficient methods to quantify the diversity of large and ultralarge chemical libraries and visualize their mutual relationships in chemical space. Herein, we show an application of our recently introduced extended similarity indices to measure the fingerprint-based diversity of 19 chemical libraries typically used in drug discovery and natural products research with over 18 million compounds. Based on this concept, we introduce the Chemical Library Networks (CLNs) as a general and efficient framework to represent visually the chemical space of large chemical libraries providing a global perspective of the relation between the libraries. For the 19 compound libraries explored in this work, it was found that the (extended) Tanimoto index offers the best description of extended similarity in combination with RDKit fingerprints. CLNs are general and can be explored with any structure representation and similarity coefficient for large chemical libraries.

show abstract

Extended continuous similarity indices: theory and application for QSAR descriptor selection

Rácz

Dunn

Bajusz

et al. 2022

J Comput Aided Mol Des

View full text Add to dashboard Cite

Extended (or n-ary) similarity indices have been recently proposed to extend the comparative analysis of binary strings. Going beyond the traditional notion of pairwise comparisons, these novel indices allow comparing any number of objects at the same time. This results in a remarkable efficiency gain with respect to other approaches, since now we can compare N molecules in O(N) instead of the common quadratic O(N 2 ) timescale. This favorable scaling has motivated the application of these indices to diversity selection, clustering, phylogenetic analysis, chemical space visualization, and post-processing of molecular dynamics simulations. However, the current formulation of the n-ary indices is limited to vectors with binary or categorical inputs. Here, we present the further generalization of this formalism so it can be applied to numerical data, i.e. to vectors with continuous components. We discuss several ways to achieve this extension and present their analytical properties. As a practical example, we apply this formalism to the problem of feature selection in QSAR and prove that the extended continuous similarity indices provide a convenient way to discern between several sets of descriptors.Recently, we have introduced several methodological frameworks to extend the usage of similarity measures beyond the common cases mentioned above. Most importantly, we have demonstrated that the mathematical expansion of the core concepts of similarity measures can provide a way to quantify the similarity of an arbitrary number of objects at the same time. We first showed this on binary (molecular) fingerprints: the resulting similarity measures were termed extended (or n-ary) similarity measures [15]. They employ the core concept of similarity and dissimilarity counters, which have replaced the a, b, c and d terms that are commonly applied in the well-known, pairwise definitions of the similarity measures to describe the number of bit positions where two fingerprints have co-occurring one (a) or zero (d) bits, or a one bit that is exclusive to either of the fingerprints (b and c). In our framework, the 1-similarity, 0-similarity, and dissimilarity counters express the number of bit positions where the number of co-occurring one (or zero) bits is above, or below, a predefined coincidence threshold, respectively. For pairwise comparisons, these generalizations naturally revert to the well-known definitions of the classical, pairwise similarity measures.We have shown that the new methodology is not only computationally efficient, scaling as O(n) with the number of compared objects n, but it can be successfully applied for tasks such as diversity selection, clustering, as well as the visualization of large sections of chemical space [16][17][18][19]. A further generalization involved the extension of this framework to allow for more than two possible characters (t = 2) in an object (vector), opening the possibility to apply the extended similarity measures in bioinformatics, for the comparison of nucleotide (t = 4) or prot...

show abstract

Surface Reactions of Low-Energy Argon Ions with Organometallic Precursors

Bilgilisoy

Thorman

et al. 2020

J. Phys. Chem. C

View full text Add to dashboard Cite

A combination of in situ X-ray photoelectron spectroscopy and mass spectrometry has been used to elucidate the elementary surface reactions initiated by the interaction of low-energy (860 eV) argon ions with three organometallic precursors [Ru(CO)4I2, Co(CO)3NO, and WN(NMe2)3]. The effects of ion exposure on each precursor can be described by a largely sequential series of surface reactions. The initial step involves ion-induced decomposition of the precursor to create a nonvolatile deposit, followed by physical sputtering of the atoms in the deposit. For the precursors that contain CO ligands [Ru(CO)4I2 and Co(CO)3NO], ion-induced decomposition is accompanied by desorption of the majority of the CO groups. This is in marked contrast to previous studies of low-energy electron-induced reactions with the same precursors where precursor decomposition yielded only partial desorption of the CO ligands. Conversely, argon ion bombardment of WN(NMe2)3 led to decomposition without ligand loss. For all three precursors, the initial ion-induced decomposition step was not accompanied by significant desorption of intact precursor molecules, while during subsequent physical sputtering of the deposited atoms, ligand-derived organic and inorganic contaminants were removed at higher rates than the metals. This indicates that controlled ion beam deposition conditions could be used to produce deposits with high metal contents from all three precursors. Comparison of low-energy electron-induced reactions of these three precursors with results of this investigation indicates that secondary electrons do not play an important role in the deposition process, but rather precursor decomposition occurs via efficient ion–molecule energy transfer. These reactions are discussed in the context of focused ion beam-induced deposition.

show abstract

Exploring activity landscapes with extended similarity: is Tanimoto enough?

Miranda‐Quintana

Dunn

López

et al. 2023

Molecular Informatics

View full text Add to dashboard Cite

Understanding structure-activity landscapes is essential in drug discovery.Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.

show abstract

Exploring activity landscapes with extended similarity: is Tanimoto enough?

Dunn

López-López

Kim

et al. 2023

Preprint

View full text Add to dashboard Cite

Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Timothy B. Dunn

Diversity and Chemical Library Networks of Large Data Sets

Extended continuous similarity indices: theory and application for QSAR descriptor selection

Surface Reactions of Low-Energy Argon Ions with Organometallic Precursors

Exploring activity landscapes with extended similarity: is Tanimoto enough?

Exploring activity landscapes with extended similarity: is Tanimoto enough?

Contact Info

Product

Resources

About