Some modifications were introduced into the previously described Centroid diversity sorting algorithm, which uses cosine similarity metric. The modified algorithm is suitable for the work with large databases on personal computers. For example, for diversity sorting of the database with the size greater than a million of records, less than 9 h are required (Pentium III, 800 MHz). The problem of selecting new compounds into the existing collection is examined to reach the maximum diversity of the collection. The article describes the new algorithm for the selection of heterocyclic compounds.
Efficient recognition of tautomeric compound forms in large corporate or commercially available compound databases is a difficult and labor intensive task. Our data indicate that up to 0.5% of commercially available compound collections for bioscreening contain tautomers. Though in the large registry databases, such as Beilstein and CAS, the tautomers are found in an automated fashion using high-performance computational technologies, their real-time recognition in the nonregistry corporate databases, as a rule, remains problematic. We have developed an effective algorithm for tautomer searching based on the proprietary chemoinformatics platform. This algorithm reduces the compound to a canonical structure. This feature enables rapid, automated computer searching of most of the known tautomeric transformations that occur in databases of organic compounds. Another useful extension of this methodology is related to the ability to effectively search for different forms of compounds that contain ionic and semipolar bonds. The computations are performed in the Windows environment on a standard personal computer, a very useful feature. The practical application of the proposed methodology is illustrated by several examples of successful recovery of tautomers and different forms of ionic compounds from real commercially available nonregistry databases.
A new approach for predicting the lipophilicity (log P), solubility (log Sw), and oral absorption of drugs in humans (FA) is described. It is based on structural and physicochemical similarity and is realized in the software program SLIPPER-2001. Calculated and experimental values of log P, log Sw, and FA for 42 drugs were used to demonstrate the predictive power of the program. Reliable results were obtained for simple compounds, for complex chemicals, and for drugs. Thus, the principle of "similar compounds display similar properties" together with estimating incremental changes in properties by using differences in physicochemical parameters results in "structure - property " predictive models even in the absence of a precise understanding of the mechanisms involved.
An efficient program, which runs on a personal computer, for the storage, retrieval, and processing of chemical information, is presented. The program can work both as a stand-alone application or in conjunction with a specifically written Web server application or with some standard SQL servers, e.g., Oracle, Interbase, and MS SQL. New types of data fields are introduced, e.g., arrays for spectral information storage, HTML and database links, and user-defined functions. CheD has an open architecture; thus, custom data types, controls, and services may be added. A WWW server application for chemical data retrieval features an easy and user-friendly installation on Windows NT or 95 platforms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.