In similarity-driven virtual screening, molecular fingerprints are widely used to assess the similarity of all compounds contained in a chemical library to a query compound of interest. This similarity analysis is traditionally done for each member of the library individually. When encoding chemical spaces that surpass billions of compounds in size, it becomes impractical to enumerate all their products, let alone assess their similarity, deeming this approach impossible without investing a substantial amount of resources. In this work, we present a novel search algorithm named SpaceLight for topological fingerprint similarity searching in large, practically non-enumerable combinatorial fragment spaces. In contrast to existing methods, SpaceLight is able to utilize the combinatorial character of these chemical spaces for efficiency while maintaining a high correlation of the description of molecular similarity to well-known molecular fingerprints like ECFP. The resulting software is able to search prominent spaces like EnamineREAL with more than 10 billion compounds in seconds on a standard desktop computer.
Molecular fingerprints are an efficient and widely used method for similarity-driven virtual screening. Most fingerprint methods can be distinguished by the class of structural features considered. The Connected Subgraph Fingerprint (CSFP) overcomes this limitation and regards all structural features of a compound. This results in a more complete feature space and high adaptive potential to certain application scenarios. The novel descriptor surpasses widely used fingerprint methods in some cases and opens the way for topological search in combinatorial fragment spaces.
The set of chemical compounds shared by two or more chemical libraries is assessed routinely as means of comparing these libraries for various applications. Traditionally this is achieved by comparing the members of the chemical libraries individually for identity. This approach becomes impractical when operating on chemical libraries exceeding billions or even trillions of compounds in size. As a result, no such analysis exists for ultralarge chemical spaces like the Enamine REAL Space containing over 20 billion compounds. In this work, we present a novel tool called SpaceCompare for the overlap calculation of large, nonenumerable combinatorial fragment spaces. In contrast to existing methods, SpaceCompare utilizes topological fingerprints and the combinatorial character of these chemical spaces. The tool is able to determine the exact overlap of prominent spaces like Enamine's REAL Space, WuXi's GalaXi Space, and Otava's CHEMriya for the first time.
Confirming a conjecture of Vera T. Sós in a very strong sense, we give a complete solution to Turán's hypergraph problem for the Fano plane. That is we prove for n ě 8 that among all 3-uniform hypergraphs on n vertices not containing the Fano plane there is indeed exactly one whose number of edges is maximal, namely the balanced, complete, bipartite hypergraph. Moreover, for n " 7 there is exactly one other extremal configuration with the same number of edges: the hypergraph arising from a clique of order 7 by removing all five edges containing a fixed pair of vertices.For sufficiently large values n this was proved earlier by Füredi and Simonovits, and by Keevash and Sudakov, who utilised the stability method.2010 Mathematics Subject Classification. 05C65, 05D05.
The distributions of physicochemical property values, like the octanol–water partition coefficient, are routinely calculated to describe and compare virtual chemical libraries. Traditionally, these distributions are derived by processing each member of a library individually and summarizing all values in a distribution. This process becomes impractical when operating on chemical spaces which surpass billions of compounds in size. In this work, we present a novel algorithmic method called SpaceProp for the property distribution calculation of large nonenumerable combinatorial fragment spaces. The novel method follows a combinatorial approach and is able to calculate physicochemical property distributions of prominent spaces like Enamine’s REAL Space, WuXi’s GalaXi Space, and OTAVA’s CHEMriya Space for the first time. Furthermore, we present a first approach of optimizing property distributions directly in combinatorial fragment spaces.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.