Drug-induced liver injury is a major issue of concern and has led to the withdrawal of a significant number of marketed drugs. An understanding of structure-activity relationships (SARs) of chemicals can make a significant contribution to the identification of potential toxic effects early in the drug development process and aid in avoiding such problems. This process can be supported by the use of existing toxicity data and mechanistic understanding of the biological processes for related compounds. In the published literature, this information is often spread across diverse sources and can be varied and unstructured in quality and content. The current work has explored whether it is feasible to collect and use such data for the development of new SARs for the hepatotoxicity endpoint and expand upon the limited information currently available in this area. Reviews of hepatotoxicity data were used to build a structure-searchable database, which was analyzed to identify chemical classes associated with an adverse effect on the liver. Searches of the published literature were then undertaken to identify additional supporting evidence, and the resulting information was incorporated into the database. This collated information was evaluated and used to determine the scope of the SARs for each class identified. Data for over 1266 chemicals were collected, and SARs for 38 classes were developed. The SARs have been implemented as structural alerts using Derek for Windows (DfW), a knowledge-based expert system, to allow clearly supported and transparent predictions. An evaluation exercise performed using a customized DfW version 10 knowledge base demonstrated an overall concordance of 56% and specificity and sensitivity values of 73% and 46%, respectively. The approach taken demonstrates that SARs for complex endpoints can be derived from the published data for use in the in silico toxicity assessment of new compounds.
Model reliability is generally assessed and reported as an intrinsic component of quantitative structure-activity relationship (QSAR) publications; it can be evaluated using defined quality criteria such as the Organisation for Economic Cooperation and Development (OECD) principles for the validation of QSARs. However, less emphasis is afforded to the assessment of model reproducibility, particularly by users who may wish to use model outcomes for decision making, but who are not QSAR experts. In this study we identified a range of QSARs in the area of absorption, distribution, metabolism, and elimination (ADME) prediction and assessed their adherence to the OECD principles, as well as investigating their reproducibility by scientists without expertise in QSAR. Here, 85 papers were reviewed, reporting over 80 models for 31 ADME-related endpoints. Of these, 12 models were identified that fulfilled at least 4 of the 5 OECD principles and 3 of these 12 could be readily reproduced. Published QSAR models should aim to meet a standard level of quality and be clearly communicated, ensuring their reproducibility, to progress the uptake of the models in both research and regulatory landscapes. A pragmatic workflow for implementing published QSAR models and recommendations to modellers, for publishing models with greater usability, are presented herein.
The cost of in vivo and in vitro screening of ADME properties of compounds has motivated efforts to develop a range of in silico models. At the heart of the development of any computational model are the data; high quality data are essential for developing robust and accurate models. The characteristics of a dataset, such as its availability, size, format and type of chemical identifiers used, influence the modelability of the data. Areas covered: This review explores the usefulness of publicly available ADME datasets for researchers to use in the development of predictive models. More than 140 ADME datasets were collated from publicly available resources and the modelability of 31 selected datasets were assessed using specific criteria derived in this study. Expert opinion: Publicly available datasets differ significantly in information content and presentation. From a modelling perspective, datasets should be of adequate size, available in a user-friendly format with all chemical structures associated with one or more chemical identifiers suitable for automated processing (e.g. CAS number, SMILES string or InChIKey). Recommendations for assessing dataset suitability for modelling and publishing data in an appropriate format are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.