Fingerprint-based similarity searching is widely used for virtual screening when only a single bioactive reference structure is available. This paper reviews three distinct ways of carrying out such searches when multiple bioactive reference structures are available: merging the individual fingerprints into a single combined fingerprint; applying data fusion to the similarity rankings resulting from individual similarity searches; and approximations to substructural analysis. Extended searches on the MDL Drug Data Report database suggest that fusing similarity scores is the most effective general approach, with the best individual results coming from the binary kernel discrimination technique.
This paper reports a detailed comparison of a range of different types of 2D fingerprints when used for similarity-based virtual screening with multiple reference structures. Experiments with the MDL Drug Data Report database demonstrate the effectiveness of fingerprints that encode circular substructure descriptors generated using the Morgan algorithm. These fingerprints are notably more effective than fingerprints based on a fragment dictionary, on hashing and on topological pharmacophores. The combination of these fingerprints with data fusion based on similarity scores provides both an effective and an efficient approach to virtual screening in lead-discovery programmes.
Similarity searching using a single bioactive reference structure is a well-established technique for accessing chemical structure databases. This paper describes two extensions of the basic approach. First, we discuss the use of group fusion to combine the results of similarity searches when multiple reference structures are available. We demonstrate that this technique is notably more effective than conventional similarity searching in scaffold-hopping searches for structurally diverse sets of active molecules; conversely, the technique will do little to improve the search performance if the actives are structurally homogeneous. Second, we make the assumption that the nearest neighbors resulting from a similarity search, using a single bioactive reference structure, are also active and use this assumption to implement approximate forms of group fusion, substructural analysis, and binary kernel discrimination. This approach, called turbo similarity searching, is notably more effective than conventional similarity searching.
In this study we evaluate how far the scope of similarity searching can be extended to identify not only ligands binding to the same target as the reference ligand(s) but also ligands of other homologous targets without initially known ligands. This "homology-based similarity searching" requires molecular representations reflecting the ability of a molecule to interact with target proteins. The Similog keys, which are introduced here as a new molecular representation, were designed to fulfill such requirements. They are based only on the molecular constitution and are counts of atom triplets. Each triplet is characterized by the graph distances and the types of its atoms. The atom-typing scheme classifies each atom by its function as H-bond donor or acceptor and by its electronegativity and bulkiness. In this study the Similog keys are investigated in retrospective in silico screening experiments and compared with other conformation independent molecular representations. Studied were molecules of the MDDR database for which the activity data was augmented by standardized target classification information from public protein classification databases. The MDDR molecule set was split randomly into two halves. The first half formed the candidate set. Ligands of four targets (dopamine D2 receptor, opioid delta-receptor, factor Xa serine protease, and progesterone receptor) were taken from the second half to form the respective reference sets. Different similarity calculation methods are used to rank the molecules of the candidate set by their similarity to each of the four reference sets. The accumulated counts of molecules binding to the reference target and groups of targets with decreasing homology to it were examined as a function of the similarity rank for each reference set and similarity method. In summary, similarity searching based on Unity 2D-fingerprints or Similog keys are found to be equally effective in the identification of molecules binding to the same target as the reference set. However, the application of the Similog keys is more effective in comparison with the other investigated methods in the identification of ligands binding to any target belonging to the same family as the reference target. We attribute this superiority to the fact that the Similog keys provide a generalization of the chemical elements and that the keys are counted instead of merely noting their presence or absence in a binary form. The second most effective molecular representation are the occurrence counts of the public ISIS key fragments, which like the Similog method, incorporates key counting as well as a generalization of the chemical elements. The results obtained suggest that ligands for a new target can be identified by the following three-step procedure: 1. Select at least one target with known ligands which is homologous to the new target. 2. Combine the known ligands of the selected target(s) to a reference set. 3. Search candidate ligands for the new targets by their similarity to the reference set using the Similo...
Computers in chemistryComputers in chemistry V 0380 Similarity Metrics for Ligands Reflecting the Similarity of the Target Proteins. -(SCHUFFENHAUER*, A.; FLOERSHEIM, P.; ACKLIN, P.; JACOBY, E.; J. Chem. Inf. Comput. Sci. 43 (2003) 2, 391-405; Drug Discovery Cent., Novartis Pharm. AG, CH-4002 Basel, Switz.; Eng.) -Lindner 21-200
Successful treatment of beta-thalassemia requires two key elements: blood transfusion and iron chelation. Regular blood transfusions considerably expand the lifespan of patients, however, without the removal of the consequential accumulation of body iron, few patients live beyond their second decade. In 1963, the introduction of desferrioxamine (DFO), a hexadentate chelator, marked a breakthrough in the treatment of beta-thalassemia. DFO significantly reduces body iron burden and iron-related morbidity and mortality. DFO is still the only drug for general use in the treatment of transfusion dependent iron overload. However, its very short plasma half-life and poor oral activity necessitate special modes of application (subcutaneous or intravenous infusion) which are inconvenient, can cause local reactions and are difficult to be accepted by many patients. Over the past four decades, many different laboratories have invested major efforts in the identification of orally active iron chelators from several hundreds of molecules of synthetic, microbial or plant origin. The discovery of ferrithiocin in 1980, followed by the synthesis of the tridentate chelator desferrithiocin and proof of its oral activity raised a lot of hope. However, the compound proved to be toxic in animals. Over a period of about fifteen years many desferrithiocin derivatives and molecules with broader alterations led to the discovery of numerous new compounds some of which were much better tolerated and were more efficacious than desferrithiocin in animals, however, none was safe enough to proceed to the clinical use. The discovery of a new chemical class of iron chelators: The bis-hydroxyphenyltriazoles re-energized the search for a safe tridentate chelator. The basic structure of this completely new chemical class of iron chelators was discovered by a combination of rational design, intuition and experience. More than forty derivatives of the triazole series were synthesized at Novartis. These compounds were evaluated, together with more than 700 chelators from various chemical classes. Using vigorous selection criteria with a focus on tolerability, the tridentate chelator 4-[(3,5-Bis-(2-hydroxyphenyl)-1,2,4)triazol-1-yl]-benzoic acid (ICL670) emerged as an entity which best combined high oral potency and tolerability in animals. ICL670 is presently being evaluated in the clinic.
We test the hypothesis that fusing the outputs of similarity searches based on a single bioactive reference structure and on its nearest neighbors (of unknown activity) is more effective (in terms of numbers of high-ranked active structures) than a similarity search involving just the reference structure. This turbo similarity searching approach provides a simple way to enhance the effectiveness of simulated virtual screening searches of the MDL Drug Data Report database.
The technology underpinning high-throughput docking (HTD) has developed over the past few years to where it has become a vital tool in modern drug discovery. Although the performance of various docking algorithms is adequate, the ability to accurately and consistently rank compounds using a scoring function remains problematic. We show that by employing a simple machine learning method (naïve Bayes) it is possible to significantly overcome this deficiency. Compounds from the Available Chemical Directory (ACD), along with known active compounds, were docked into two protein targets using three software packages. In cases where HTD alone was able to show some enrichment, the application of naïve Bayes was able to improve upon the enrichment. The application of this methodology to enrich HTD results can be carried out without a priori knowledge of the activity of compounds and results in superior enrichment of known actives compared to the use of scoring methods alone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.