Drug molecules consist of a few tens of atoms connected by covalent bonds. How many such molecules are possible in total and what is their structure? This question is of pressing interest in medicinal chemistry to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new molecular series. To better define the unknown chemical space, we have enumerated 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens forming the chemical universe database GDB-17, covering a size range containing many drugs and typical for lead compounds. GDB-17 contains millions of isomers of known drugs, including analogs with high shape similarity to the parent drug. Compared to known molecules in PubChem, GDB-17 molecules are much richer in nonaromatic heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types.
GDB-13 enumerates small organic molecules containing up to 13 atoms of C, N, O, S, and Cl following simple chemical stability and synthetic feasibility rules. With 977,468,314 structures, GDB-13 is the largest publicly available small organic molecule database to date.
Clinical specimens are each inherently unique, limited and non-renewable. As such, small samples such as tissue biopsies are often completely consumed after a limited number of analyses. Here we present a method that enables fast and reproducible conversion of a small amount of tissue (approximating the quantity obtained by a biopsy) into a single, permanent digital file representing the mass spectrometry-measurable proteome of the sample. The method combines pressure cycling technology (PCT) and SWATH mass spectrometry (MS), and the resulting proteome maps can be analyzed, re-analyzed, compared and mined in silico to detect and quantify specific proteins across multiple samples. We used this method to process and convert 18 biopsy samples from 9 renal cell carcinoma patients into SWATH-MS fragment ion maps. From these proteome maps we detected and quantified more than 2,000 proteins with a high degree of reproducibility across all samples. The identified proteins clearly separated tumorous kidney tissues from healthy tissue, and differentiated distinct histomorphological kidney cancer subtypes.
The degree and the origins of quantitative variability of most human plasma proteins are largely unknown. Because the twin study design provides a natural opportunity to estimate the relative contribution of heritability and environment to different traits in human population, we applied here the highly accurate and reproducible SWATH mass spectrometry technique to quantify 1,904 peptides defining 342 unique plasma proteins in 232 plasma samples collected longitudinally from pairs of monozygotic and dizygotic twins at intervals of 2–7 years, and proportioned the observed total quantitative variability to its root causes, genes, and environmental and longitudinal factors. The data indicate that different proteins show vastly different patterns of abundance variability among humans and that genetic control and longitudinal variation affect protein levels and biological processes to different degrees. The data further strongly suggest that the plasma concentrations of clinical biomarkers need to be calibrated against genetic and temporal factors. Moreover, we identified 13 cis-SNPs significantly influencing the level of specific plasma proteins. These results therefore have immediate implications for the effective design of blood-based biomarker studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.