Identifying and purchasing new small molecules to test in biological assays are enabling for ligand discovery, but as purchasable chemical space continues to grow into the tens of billions based on inexpensive make-on-demand compounds, simply searching this space becomes a major challenge. We have therefore developed ZINC20, a new version of ZINC with two major new features: billions of new molecules and new methods to search them. As a fully enumerated database, ZINC can be searched precisely using explicit atomic-level graph-based methods, such as SmallWorld for similarity and Arthor for pattern and substructure search, as well as 3D methods such as docking. Analysis of the new make-on-demand compound sets by these and related tools reveals startling features. For instance, over 97% of the core Bemis–Murcko scaffolds in make-on-demand libraries are unavailable from “in-stock” collections. Correspondingly, the number of new Bemis–Murcko scaffolds is rising almost as a linear fraction of the elaborated molecules. Thus, an 88-fold increase in the number of molecules in the make-on-demand versus the in-stock sets is built upon a 16-fold increase in the number of Bemis–Murcko scaffolds. The make-on-demand library is also more structurally diverse than physical libraries, with a massive increase in disc- and sphere-like shaped molecules. The new system is freely available at .
Enrichment of ligands versus property-matched decoys is widely used to test and optimize docking library screens. However, the unconstrained optimization of enrichment alone can mislead, leading to false confidence in prospective performance. This can arise by over-optimizing for enrichment against property-matched decoys, without considering the full spectrum of molecules to be found in a true large library screen. Adding decoys representing charge extrema helps mitigate over-optimizing for electrostatic interactions. Adding decoys that represent the overall characteristics of the library to be docked allows one to sample molecules not represented by ligands and property-matched decoys but that one will encounter in a prospective screen. An optimized version of the DUD-E set (DUDE-Z), as well as Extrema and sets representing broad features of the library (Goldilocks), is developed here. We also explore the variability that one can encounter in enrichment calculations and how that can temper one's confidence in small enrichment differences. The new tools and new decoy sets are freely available at http://tldr.docking.org and http://dudez.docking.org.
Purchasable chemical space has grown rapidly into the tens of billions of molecules, providing unprecedented opportunities for ligand discovery but straining the tools that might exploit these molecules at scale. We have therefore developed ZINC-22, a database of commercially accessible small molecules derived from multi-billion-scale make-on-demand libraries. The new database and tools enable analog searching in this vast new space via a facile GUI, CartBlanche, drawing on similarity methods that scale sublinearly in the number of molecules. The new library also uses data organization methods, enabling rapid lookup of molecules and their physical properties, including conformations, partial atomic charges, c Log P values, and solvation energies, all crucial for molecule docking, which had become slow with older database organizations in previous versions of ZINC. As the libraries have continued to grow, we have been interested in finding whether molecular diversity has suffered, for instance, because certain scaffolds have come to dominate via easy analoging. This has not occurred thus far, and chemical diversity continues to grow with database size, with a log increase in Bemis–Murcko scaffolds for every two-log unit increase in database size. Most new scaffolds come from compounds with the highest heavy atom count. Finally, we consider the implications for databases like ZINC as the libraries grow toward and beyond the trillion-molecule range. ZINC is freely available to everyone and may be accessed at cartblanche22.docking.org, via Globus, and in the Amazon AWS and Oracle OCI clouds.
Purchasable chemical space has grown rapidly into the tens of billions of molecules providing unprecedented opportunities for ligand discovery, but also straining the tools that might exploit these molecules at scale. We have therefore developed ZINC-22, a database of commercially accessible small molecules derived from multi-billion-scale make-on-demand libraries. The new database and tools enable analog searching in this vast new space via a facile GUI, CartBlanche, drawing on similarity methods that scale sub-linearly in the number of molecules. The new library also uses data organization methods enabling rapid lookup of molecules and their physical properties, including conformations, partial atomic charges, cLogP values, and solvation energies, all crucial for molecule docking, which had become slow with older database organizations in previous versions of ZINC. As the libraries have continued to grow, we have been interested if molecular diversity has suffered, for instance, because certain scaffolds have come to dominate via easy analoging. This has not occurred thus far, and chemical diversity continues to grow with database size, with a log increase in Bemis-Murcko scaffolds for every two logs increase in database size. Most new scaffolds come from compounds with the highest heavy atom count. Finally, we consider the implications for databases like ZINC as the libraries grow towards and beyond the trillion-molecule range. ZINC is freely available to everyone and may be accessed at cartblanche22.docking.org, via Globus, and in the Amazon AWS and Oracle OCI clouds.
cis-β-Bromostyrene derivatives were synthesized stereospecifically from cinnamic acids through β-lactone intermediates. The synthetic sequence did not require the purification of the β-lactone intermediates although they were found to be stable and readily purified in most cases.
Molecular docking is widely used to leverage protein structure for ligand discovery, but the technique retains important liabilities that make it challenging to deploy on a large scale. Notwithstanding multiple attempts at automation, molecular docking continues to require the guidance of an expert thus limiting its use by many investigators who could benefit from it. To make docking more accessible we have created new software that allows us to investigate the automation of molecular docking screens. Our method currently requires known ligands and decoys for model evaluation. Of 42 DUDEZ targets, all show automated docking results that are better than our previous automated protocol. The new system is available both as part of the UCSF DOCK 3.8 package, which is free to academics, as well as via our website tldr.docking.org/start/dockopt (free registration required), which is free to everyone.
Molecular docking is a widely used technique for leveraging protein structure in ligand discovery, but as a method, it remains difficult to utilize due to limitations that have not been adequately addressed. Despite some progress towards automation, docking still requires expert guidance, hindering its adoption by a broader range of investigators. To make docking more accessible, we have developed a new command-line utility called dockopt, which automates the creation, evaluation, and optimization of docking models prior to their deployment in large-scale prospective screens. dockopt outperforms our previous automated pipeline across all 43 targets in the DUDE-Z benchmark, and the generated models for 86% of targets demonstrate sufficient enrichment to warrant their use in prospective screens, with normalized LogAUC values of at least 15%. dockopt is available as part of the Python package pydock3 included in the UCSF DOCK 3.8 distribution, which is available for free to academic researchers at https://dock.compbio.ucsf.edu, and free for everyone upon registration at https://tldr.docking.org.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.