Some modifications were introduced into the previously described Centroid diversity sorting algorithm, which uses cosine similarity metric. The modified algorithm is suitable for the work with large databases on personal computers. For example, for diversity sorting of the database with the size greater than a million of records, less than 9 h are required (Pentium III, 800 MHz). The problem of selecting new compounds into the existing collection is examined to reach the maximum diversity of the collection. The article describes the new algorithm for the selection of heterocyclic compounds.
A new approach for predicting the lipophilicity (log P), solubility (log Sw), and oral absorption of drugs in humans (FA) is described. It is based on structural and physicochemical similarity and is realized in the software program SLIPPER-2001. Calculated and experimental values of log P, log Sw, and FA for 42 drugs were used to demonstrate the predictive power of the program. Reliable results were obtained for simple compounds, for complex chemicals, and for drugs. Thus, the principle of "similar compounds display similar properties" together with estimating incremental changes in properties by using differences in physicochemical parameters results in "structure - property " predictive models even in the absence of a precise understanding of the mechanisms involved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.