Drug molecules consist of a few tens of atoms connected by covalent bonds. How many such molecules are possible in total and what is their structure? This question is of pressing interest in medicinal chemistry to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new molecular series. To better define the unknown chemical space, we have enumerated 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens forming the chemical universe database GDB-17, covering a size range containing many drugs and typical for lead compounds. GDB-17 contains millions of isomers of known drugs, including analogs with high shape similarity to the parent drug. Compared to known molecules in PubChem, GDB-17 molecules are much richer in nonaromatic heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types.
GDB-13 enumerates small organic molecules containing up to 13 atoms of C, N, O, S, and Cl following simple chemical stability and synthetic feasibility rules. With 977,468,314 structures, GDB-13 is the largest publicly available small organic molecule database to date.
Clinical specimens are each inherently unique, limited and non-renewable. As such, small samples such as tissue biopsies are often completely consumed after a limited number of analyses. Here we present a method that enables fast and reproducible conversion of a small amount of tissue (approximating the quantity obtained by a biopsy) into a single, permanent digital file representing the mass spectrometry-measurable proteome of the sample. The method combines pressure cycling technology (PCT) and SWATH mass spectrometry (MS), and the resulting proteome maps can be analyzed, re-analyzed, compared and mined in silico to detect and quantify specific proteins across multiple samples. We used this method to process and convert 18 biopsy samples from 9 renal cell carcinoma patients into SWATH-MS fragment ion maps. From these proteome maps we detected and quantified more than 2,000 proteins with a high degree of reproducibility across all samples. The identified proteins clearly separated tumorous kidney tissues from healthy tissue, and differentiated distinct histomorphological kidney cancer subtypes.
The degree and the origins of quantitative variability of most human plasma proteins are largely unknown. Because the twin study design provides a natural opportunity to estimate the relative contribution of heritability and environment to different traits in human population, we applied here the highly accurate and reproducible SWATH mass spectrometry technique to quantify 1,904 peptides defining 342 unique plasma proteins in 232 plasma samples collected longitudinally from pairs of monozygotic and dizygotic twins at intervals of 2–7 years, and proportioned the observed total quantitative variability to its root causes, genes, and environmental and longitudinal factors. The data indicate that different proteins show vastly different patterns of abundance variability among humans and that genetic control and longitudinal variation affect protein levels and biological processes to different degrees. The data further strongly suggest that the plasma concentrations of clinical biomarkers need to be calibrated against genetic and temporal factors. Moreover, we identified 13 cis-SNPs significantly influencing the level of specific plasma proteins. These results therefore have immediate implications for the effective design of blood-based biomarker studies.
The chemical space is the ensemble of all possible molecules, which is believed to contain at least 10 60 organic molecules below 500 Da of possible interest for drug discovery. This review summarizes the development of the chemical space concept from enumerating acyclic hydrocarbons in the 1800's to the recent assembly of the chemical universe database GDB. Chemical space travel algorithms can be used to explore defined regions of chemical space by generating focused virtual libraries. Maps of the chemical space are produced from property spaces visualized by principal component analysis or by self-organizing maps, and from structural analyses such as the scaffold-tree or the MQN-system. Virtual screening of virtual chemical space followed by synthesis and testing of the best hits leads to the discovery of new drug molecules.
The periodic table classifies elements by increasing atomic number in periods following the principal quantum number, and allows their physicochemical properties to be rationalized.[1] Herein, we propose a related system for organic molecules based on 42 molecular quantum numbers (MQNs), defined here as counts for simple structural features such as atom, bond and ring types, creating a multidimensional grid called MQN space. In analogy to the elements and their isotopes grouped in each entry of the periodic table, MQN isomers have identical MQNs and occupy the same position in MQN space. The MQN system is able to analyze large molecular databases and clusters compounds with similar structure, physicochemical properties and bioactivities, as illustrated for the databases ZINC [2] and GDB-11.
In the field of medicinal chemistry, the chemical space describes the ensemble of all organic molecules to be considered when searching for new drugs (estimated >1060 molecules), as well as the property spaces in which these molecules are placed for the sake of describing them. Molecules can be enumerated computationally by the millions, which was first undertaken in the field of computer‐aided structure elucidation. Scoring the enumerated virtual libraries by virtual screening has recently become an attractive strategy to prioritize compounds for synthesis and testing. Enumeration methods include combinatorial linking of fragments, genetic algorithms based on cycles of enumeration and selection by ligand‐based or target‐based scoring functions, and exhaustive enumeration from first principles. The chemical space of molecules following simple rules of chemical stability and synthetic feasibility has been enumerated up to 13 atoms of C, N, O, Cl, S, forming the GDB‐13 database with 977 million structures. The database has been organized in a 42‐dimensional chemical space using molecular quantum numbers (MQN) as descriptors, which can be visualized by projection in two dimensions by principal component analysis, and searched within seconds using a Web browser available at www.gdb.unibe.ch. © 2012 John Wiley & Sons, Ltd. This article is categorized under: Computer and Information Science > Chemoinformatics
The chemical universe database GDB-13, which enumerates 977 million organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules, represents a vast reservoir for new fragments. GDB-13 was classified using the MQN-system discussed in the preceding paper for the analysis of PubChem fragments. Two hundred and fiftyfive subsets of GDB-13 were generated by the combinatorial use of eight restrictive criteria, including fragmentlike (''rule of three'') and scaffold-like (no acyclic carbon atoms) filters. Virtual screening for analogs of 15 commercial drugs of 13 non-hydrogen atoms or less shows that retrieving MQN-neighbors of a query molecule from GDB-13 or its subsets provides on average a 38-fold enrichment in structural analogs (Daylight-type substructure fingerprint Tanimoto T SF [ 0.7), and a 75-fold enrichment in shapesimilar analogs (ROCS TanimotoCombo score [ 1.4). An MQN-searchable version of GDB-13 is provided at www.gdb.unibe.ch.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.