ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.
ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 Nucleic Acids Research Database Issue. Since then, a variety of new data sources and improvements in functionality have contributed to the growth and utility of the resource. In particular, more comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications; a new richer data model for representing drug targets has been developed; and a number of methods have been put in place to allow users to more easily identify reliable data. Finally, access to ChEMBL is now available via a new Resource Description Framework format, in addition to the web-based interface, data downloads and web services.
Voltage-gated sodium channels drive the initial depolarization phase of the cardiac action potential and therefore critically determine conduction of excitation through the heart. In patients, deletions or loss-of-function mutations of the cardiac sodium channel gene, SCN5A, have been associated with a wide range of arrhythmias including bradycardia (heart rate slowing), atrioventricular conduction delay, and ventricular fibrillation. The pathophysiological basis of these clinical conditions is unresolved. Here we show that disruption of the mouse cardiac sodium channel gene, Scn5a, causes intrauterine lethality in homozygotes with severe defects in ventricular morphogenesis whereas heterozygotes show normal survival. Whole-cell patch clamp analyses of isolated ventricular myocytes from adult Scn5a ؉/؊ mice demonstrate a Ϸ50% reduction in sodium conductance. Scn5a ؉/؊ hearts have several defects including impaired atrioventricular conduction, delayed intramyocardial conduction, increased ventricular refractoriness, and ventricular tachycardia with characteristics of reentrant excitation. These findings reconcile reduced activity of the cardiac sodium channel leading to slowed conduction with several apparently diverse clinical phenotypes, providing a model for the detailed analysis of the pathophysiology of arrhythmias. Cardiac arrhythmias, manifest clinically by symptoms of extra, slow, or rapid heart beats, form one of the most common groups of diseases (1). The detailed understanding of the pathophysiology of these conditions now seems possible (2), having been advanced by the identification of ion channel mutations in patients with these conditions (3-5). What has become clear is that the functional consequences of such mutations can be complex, resolved only by combining appropriate clinical, experimental, and theoretical approaches (2). Accordingly, the consequences of gainof-function mutations in the cardiac sodium channel gene, SCN5A, in patients with long-QT syndrome (LQT3) (6, 7), have been investigated by studies of clinical genotype-phenotype relationships (3)(4)(5)8) and their cellular electrophysiology (9, 10) by using computer models (11,12) and the construction of a transgenic mouse (13). The results of these various investigations have allowed a clearer picture to emerge of the pathophysiology of LQT3 (7).In addition to the descriptions of long-QT syndrome-associated mutations, loss-of-function mutations in SCN5A (14, 15) have been described in patients with phenotypic characteristics of bradycardia (16, 17), atrioventricular block (16, 18), and ventricular fibrillation (18)(19)(20)(21)(22). These observations suggest a central role for the sodium channel in the maintenance of the normal heart beat (23-25). The mechanism of arrhythmias in these conditions, however, remains unresolved, although fibrillation could result from delayed conduction, unidirectional block, and reentrant excitation (3, 4). We have used homologous recombination in embryonic stem cells to establish mice with a null mutatio...
ChEMBL is now a well-established resource in the fields of drug discovery and medicinal chemistry research. The ChEMBL database curates and stores standardized bioactivity, molecule, target and drug data extracted from multiple sources, including the primary medicinal chemistry literature. Programmatic access to ChEMBL data has been improved by a recent update to the ChEMBL web services (version 2.0.x, https://www.ebi.ac.uk/chembl/api/data/docs), which exposes significantly more data from the underlying database and introduces new functionality. To complement the data-focused services, a utility service (version 1.0.x, https://www.ebi.ac.uk/chembl/api/utils/docs), which provides RESTful access to commonly used cheminformatics methods, has also been concurrently developed. The ChEMBL web services can be used together or independently to build applications and data processing workflows relevant to drug discovery and chemical biology.
A major cause of the paucity of new starting points for drug discovery is the lack of interaction between academia and industry. Much of the global resource in biology is present in universities, whereas the focus of medicinal chemistry is still largely within industry. Open source drug discovery, with sharing of information, is clearly a first step towards overcoming this gap. But the interface could especially be bridged through a scale-up of open sharing of physical compounds, which would accelerate the finding of new starting points for drug discovery. The Medicines for Malaria Venture Malaria Box is a collection of over 400 compounds representing families of structures identified in phenotypic screens of pharmaceutical and academic libraries against the Plasmodium falciparum malaria parasite. The set has now been distributed to almost 200 research groups globally in the last two years, with the only stipulation that information from the screens is deposited in the public domain. This paper reports for the first time on 236 screens that have been carried out against the Malaria Box and compares these results with 55 assays that were previously published, in a format that allows a meta-analysis of the combined dataset. The combined biochemical and cellular assays presented here suggest mechanisms of action for 135 (34%) of the compounds active in killing multiple life-cycle stages of the malaria parasite, including asexual blood, liver, gametocyte, gametes and insect ookinete stages. In addition, many compounds demonstrated activity against other pathogens, showing hits in assays with 16 protozoa, 7 helminths, 9 bacterial and mycobacterial species, the dengue fever mosquito vector, and the NCI60 human cancer cell line panel of 60 human tumor cell lines. Toxicological, pharmacokinetic and metabolic properties were collected on all the compounds, assisting in the selection of the most promising candidates for murine proof-of-concept experiments and medicinal chemistry programs. The data for all of these assays are presented and analyzed to show how outstanding leads for many indications can be selected. These results reveal the immense potential for translating the dispersed expertise in biological assays involving human pathogens into drug discovery starting points, by providing open access to new families of molecules, and emphasize how a small additional investment made to help acquire and distribute compounds, and sharing the data, can catalyze drug discovery for dozens of different indications. Another lesson is that when multiple screens from different groups are run on the same library, results can be integrated quickly to select the most valuable starting points for subsequent medicinal chemistry efforts.
A large proportion of biomedical research and the development of therapeutics is focused on a small fraction of the human genome. In a strategic effort to map the knowledge gaps around proteins encoded by the human genome and to promote the exploration of currently understudied, but potentially druggable, proteins, the US National Institutes of Health launched the Illuminating the Druggable Genome (IDG) initiative in 2014. In this article, we discuss how the systematic collection and processing of a wide array of genomic, proteomic, chemical and disease-related resource data by the IDG Knowledge Management Center have enabled the development of evidence-based criteria for tracking the target development level (TDL) of human proteins, which indicates a substantial knowledge deficit for approximately one out of three proteins in the human proteome. We then present spotlights on the TDL categories as well as key drug target classes, including G protein-coupled receptors, protein kinases and ion channels, which illustrate the nature of the unexplored opportunities for biomedical research and therapeutic development.
The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method (‘DNN_PCM’) performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized ‘DNN_PCM’). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols.Graphical Abstract. Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-017-0232-0) contains supplementary material, which is available to authorized users.
Previous studies of the analysis of molecular matched pairs (MMPs) have often assumed that the effect of a substructural transformation on a molecular property is independent of the context (i.e., the local structural environment in which that transformation occurs). Experiments with large sets of hERG, solubility, and lipophilicity data demonstrate that the inclusion of contextual information can enhance the predictive power of MMP analyses, with significant trends (both positive and negative) being identified that are not apparent when using conventional, context-independent approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.