Quantitative Structure-Activity Relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss: (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists towards collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
The U.S. Environmental Protection Agency's (EPA) ToxCast program is testing a large library of Agency-relevant chemicals using in vitro high-throughput screening (HTS) approaches to support the development of improved toxicity prediction models. Launched in 2007, Phase I of the program screened 310 chemicals, mostly pesticides, across hundreds of ToxCast assay end points. In Phase II, the ToxCast library was expanded to 1878 chemicals, culminating in the public release of screening data at the end of 2013. Subsequent expansion in Phase III has resulted in more than 3800 chemicals actively undergoing ToxCast screening, 96% of which are also being screened in the multi-Agency Tox21 project. The chemical library unpinning these efforts plays a central role in defining the scope and potential application of ToxCast HTS results. The history of the phased construction of EPA's ToxCast library is reviewed, followed by a survey of the library contents from several different vantage points. CAS Registry Numbers are used to assess ToxCast library coverage of important toxicity, regulatory, and exposure inventories. Structure-based representations of ToxCast chemicals are then used to compute physicochemical properties, substructural features, and structural alerts for toxicity and biotransformation. Cheminformatics approaches using these varied representations are applied to defining the boundaries of HTS testability, evaluating chemical diversity, and comparing the ToxCast library to potential target application inventories, such as used in EPA's Endocrine Disruption Screening Program (EDSP). Through several examples, the ToxCast chemical library is demonstrated to provide comprehensive coverage of the knowledge domains and target inventories of potential interest to EPA. Furthermore, the varied representations and approaches presented here define local chemistry domains potentially worthy of further investigation (e.g., not currently covered in the testing library or defined by toxicity "alerts") to strategically support data mining and predictive toxicology modeling moving forward.
Chemotypes are a new approach for representing molecules, chemical substructures and patterns, reaction rules, and reactions. Chemotypes are capable of integrating types of information beyond what is possible using current representation methods (e.g., SMARTS patterns) or reaction transformations (e.g., SMIRKS, reaction SMILES). Chemotypes are expressed in the XML-based Chemical Subgraphs and Reactions Markup Language (CSRML), and can be encoded not only with connectivity and topology but also with properties of atoms, bonds, electronic systems, or molecules. CSRML has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory, and commercial-use chemical space, as well as to represent chemical patterns and properties especially relevant to various toxicity concerns. A software application, ChemoTyper has also been developed and made publicly available in order to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML-based CSRML standard used to express chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.