Biochemical combinatorial techniques such as phage display, RNA display and oligonucleotide aptamers have proven to be reliable methods for generation of ligands to protein targets. Adapting these techniques to small synthetic molecules has been a long-sought goal. We report the synthesis and interrogation of an 800-million-member DNA-encoded library in which small molecules are covalently attached to an encoding oligonucleotide. The library was assembled by a combination of chemical and enzymatic synthesis, and interrogated by affinity selection. We describe methods for the selection and deconvolution of the chemical display library, and the discovery of inhibitors for two enzymes: Aurora A kinase and p38 MAP kinase.
Complex mixtures of DNA encoded small molecules may be readily interrogated via high-throughput sequencing. These DNA encoded libraries (DELs) are commonly used to discover molecules that interact with pharmaceutically relevant proteins. The chemical diversity displayed by the library is key to successful discovery of potent, novel, and drug-like chemical matter. The small molecule moieties of DELs are generally synthesized though a multistep process, and each chemical step is accomplished while it is simultaneously attached to an encoding DNA oligomer. Hence, library chemical diversity is often limited to DNA compatible synthetic reactions. Herein, protocols for 24 reactions are provided that have been optimized for high-throughput production of DELs. These protocols detail the multistep synthesis of benzimidazoles, imidazolidinones, quinazolinones, isoindolinones, thiazoles, and imidazopyridines. Additionally, protocols are provided for a diverse range of useful chemical reactions including BOC deprotection (under pH neutral conditions), carbamylation, and Sonogashira coupling. Last, step-by-step protocols for synthesizing functionalized DELs from trichloronitropyrimidine and trichloropyrimidine scaffolds are detailed.
Robotic high-throughput compound screening (HTS) and, increasingly, DNA-encoded library (DEL) screening are driving bioactive chemical matter discovery in the post-genome era. HTS enables activity-based investigation of highly complex targets using static compound libraries. Conversely, DEL grants efficient access to novel chemical diversity, although screening is limited to affinity-based selections. Here, we describe an integrated droplet-based microfluidic circuit that directly screens solid-phase DELs for activity. An example screen of a 67,100-member library for inhibitors of the phosphodiesterase autotaxin yielded 35 high-priority structures for nanomolescale synthesis and validation (20 active), guiding candidate selection for synthesis at scale (5/5 compounds with IC50s 4-10 μM). We further compared activity-based hits with those of an analogous affinity-based DEL selection. This miniaturized screening platform paves the way toward applying DELs to more complex targets (signaling pathways, cellular response), and represents a distributable approach to small molecule discovery.
As a potential target for obesity, human BCATm was screened against more than 14 billion DNA encoded compounds of distinct scaffolds followed by off-DNA synthesis and activity confirmation. As a consequence, several series of BCATm inhibitors were discovered. One representative compound (R)-3-((1-(5-bromothiophene-2-carbonyl)-pyrrolidin-3-yl)oxy)-N-methyl-2′-(methylsulfonamido)-[1,1′-biphenyl]-4-carboxamide (15e) from a novel compound library synthesized via on-DNA Suzuki−Miyaura cross-coupling showed BCATm inhibitory activity with IC 50 = 2.0 μM. A protein crystal structure of 15e revealed that it binds to BCATm within the catalytic site adjacent to the PLP cofactor. The identification of this novel inhibitor series plus the establishment of a BCATm protein structure provided a good starting point for future structure-based discovery of BCATm inhibitors.
DNA encoded library (DEL) technology allows for rapid generation of extremely large numbers of small molecules and is often used to find novel chemical starting points for pharmaceutically relevant proteins. DEL selection output consists of a list of small-molecule structures and enrichment levels. It is widely presumed that molecules with greater enrichment will have larger equilibrium association constants, and follow-up efforts are triaged accordingly. Herein we describe a simple mathematical model used to simulate DEL selections. Simulations predict that enrichment levels will correlate poorly with equilibrium association constants when selections use high concentrations of protein or lower quality DELs (high variance in final product synthetic yields). A potentially superior technique is demonstrated to qualitatively assess equilibrium association constants directly from sequencing data. This technique requires conducting selections over a range of protein concentrations, so that the influence of synthetic yield can be accounted for.
Analysis of physical properties and structural diversity of 57 molecules derived from screening 5–16 DNA encoded libraries against two protein targets. DNA encoded library size is not predictive of productivity.
Simulated screening of DNA encoded libraries indicates that the presence of truncated byproducts complicates the relationship between library member enrichment and equilibrium association constant (these truncates result from incomplete chemical reactions during library synthesis). Further, simulations indicate that some patterns observed in reported experimental data may result from the presence of truncated byproducts in the library mixture and not structure-activity relationships. Potential experimental methods of minimizing the presence of truncates are assessed via simulation; the relationship between enrichment and equilibrium association constant for libraries of differing purities is investigated. Data aggregation techniques are demonstrated that allow for more accurate analysis of screening results, in particular when the screened library contains significant quantities of truncates.
To optimize future DNA-encoded library design, we have attempted to quantify the library size at which the signal becomes undetectable. To accomplish this we (i) have calculated that percent yields of individual library members following a screen range from 0.002 to 1%, (ii) extrapolated that ∼1 million copies per library member are required at the outset of a screen, and (iii) from this extrapolation predict that false negative rates will begin to outweigh the benefit of increased diversity at library sizes >10. The above analysis is based upon a large internal data set comprising multiple screens, targets, and libraries; we also augmented our internal data with all currently available literature data. In theory, high false negative rates may be overcome by employing larger amounts of library; however, we argue that using more than currently reported amounts of library (≫10 nmoles) is impractical. The above conclusions may be generally applicable to other DNA encoded library platforms, particularly those platforms that do not allow for library amplification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.