Pfizer Global Virtual Library (PGVL) of 10(13) readily synthesizable molecules offers a tremendous opportunity for lead optimization and scaffold hopping in drug discovery projects. However, mining into a chemical space of this size presents a challenge for the concomitant design informatics due to the fact that standard molecular similarity searches against a collection of explicit molecules cannot be utilized, since no chemical information system could create and manage more than 10(8) explicit molecules. Nevertheless, by accepting a tolerable level of false negatives in search results, we were able to bypass the need for full 10(13) enumeration and enabled the efficient similarity search and retrieval into this huge chemical space for practical usage by medicinal chemists. In this report, two search methods (LEAP1 and LEAP2) are presented. The first method uses PGVL reaction knowledge to disassemble the incoming search query molecule into a set of reactants and then uses reactant-level similarities into actual available starting materials to focus on a much smaller sub-region of the full virtual library compound space. This sub-region is then explicitly enumerated and searched via a standard similarity method using the original query molecule. The second method uses a fuzzy mapping onto candidate reactions and does not require exact disassembly of the incoming query molecule. Instead Basis Products (or capped reactants) are mapped into the query molecule and the resultant asymmetric similarity scores are used to prioritize the corresponding reactions and reactant sets. All sets of Basis Products are inherently indexed to specific reactions and specific starting materials. This again allows focusing on a much smaller sub-region for explicit enumeration and subsequent standard product-level similarity search. A set of validation studies were conducted. The results have shown that the level of false negatives for the disassembly-based method is acceptable when the query molecule can be recognized for exact disassembly, and the fuzzy reaction mapping method based on Basis Products has an even better performance in terms of lower false-negative rate because it is not limited by the requirement that the query molecule needs to be recognized by any disassembly algorithm. Both search methods have been implemented and accessed through a powerful desktop molecular design tool (see ref. (33) for details). The chapter will end with a comparison of published search methods against large virtual chemical space.
An unprecedented amount of parallel synthesis information was accumulated within Pfizer over the past 12 years. This information was captured by an informatics tool known as PGVL (Pfizer Global Virtual Library). PGVL was used for many aspects of drug discovery including automated reactant mining and reaction product formation to build a synthetically feasible virtual compound collection. In this report, PGVL is discussed in detail. The chemistry information within PGVL has been used to extract synthesis and design information using an intuitive desktop Graphic User Interface, PGVL Hub. Several real-case examples of PGVL are also presented.
PGVL Hub is an integrated molecular design desktop tool that has been developed and globally deployed throughout Pfizer discovery research units to streamline the design and synthesis of combinatorial libraries and singleton compounds. This tool supports various workflows for design of singletons, combinatorial libraries, and Markush exemplification. It also leverages the proprietary PGVL virtual space (which contains 10(14) molecules spanned by experimentally derived synthesis protocols and suitable reactants) for lead idea generation, lead hopping, and library design. There had been an intense focus on ease of use, good performance and robustness, and synergy with existing desktop tools such as ISIS/Draw and SpotFire. In this chapter we describe the three-tier enterprise software architecture, key data structures that enable a wide variety of design scenarios and workflows, major technical challenges encountered and solved, and lessons learned during its development and deployment throughout its production cycles. In addition, PGVL Hub represents an extendable and enabling platform to support future innovations in library and singleton compound design while being a proven channel to deliver those innovations to medicinal chemists on a global scale.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.