Chemotypes are a new approach for representing molecules, chemical substructures and patterns, reaction rules, and reactions. Chemotypes are capable of integrating types of information beyond what is possible using current representation methods (e.g., SMARTS patterns) or reaction transformations (e.g., SMIRKS, reaction SMILES). Chemotypes are expressed in the XML-based Chemical Subgraphs and Reactions Markup Language (CSRML), and can be encoded not only with connectivity and topology but also with properties of atoms, bonds, electronic systems, or molecules. CSRML has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory, and commercial-use chemical space, as well as to represent chemical patterns and properties especially relevant to various toxicity concerns. A software application, ChemoTyper has also been developed and made publicly available in order to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML-based CSRML standard used to express chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge.
The EC number system for the classification of enzymes uses different criteria such as reaction pattern, the nature of the substrate, the type of transferred groups or the type of acceptor group. These criteria are used with different emphasis for the various enzyme classes and thus do not contribute much to an understanding of the mechanisms of enzyme catalyzed reactions. To explore the reasons for bonds being broken in enzyme catalyzed metabolic reactions, we calculated physicochemical effects for the bonds reacting in the substrate of these enzymatic reactions. These descriptors allow the definition of similarities within these reactions and thus can serve as a method for the classification of enzyme reactions. To foster an understanding of the investigations performed here, we compare the similarities found on the basis of the physicochemical effects with the EC number classification. To allow a reasonable comparison we selected enzymatic reactions where the EC number system is largely built on criteria based on the reaction mechanism. This is true for hydrolysis reactions, falling into the domain of the EC class 3 (EC 3.b.c.d). The comparison is made by a Kohonen neural network based on an unsupervised learning algorithm. For these hydrolysis reactions, the similarity analysis on physicochemical effects produces results that are, by and large, similar to the EC number. However, this similarity analysis reveals finer details of the enzymatic reactions and thus can provide a better basis for the mechanistic comparison of enzymes.
The correct identification of the reacting bonds and atoms is a prerequisite for the analysis of the reaction mechanism. We have recently developed a method based on the Imaginary Transition State Energy Minimization approach for automatically determining the reaction center information and the atom-atom mapping numbers. We test here the accuracy of this ITSE approach by comparing the predictions of the method against more than 1500 manually annotated reactions from BioPath, a comprehensive database of biochemical reactions. The results show high agreement between manually annotated mappings and computational predictions (98.4%), with significant discrepancies in only 24 cases out of 1542 (1.6%). This result validates both the computational prediction and the database, at the same time, as the results of the former agree with expert knowledge and the latter appears largely self-consistent, and consistent with a simple principle. In 10 of the discrepant cases, simple chemical arguments or independent literature studies support the predicted reaction center. In five reaction instances the differences in the automatically and manually annotated mappings are described in detail. Finally, in approximately 200 cases the algorithm finds alternate reaction centers, which need to be studied on a case by case basis, as the exact choice of the alternative may depend on the enzyme catalyzing the reaction.
The Biochemical Pathways Wall Chart (http://www.expasy.org/tools/pathways/ref.1) has been converted into a molecule and reaction database. Major features of this database are that each molecule is represented by lists of all atoms and bonds (as connection tables), and in the reactions the reaction centre, the atoms and bonds directly involved in the bond rearrangement process, are marked. The information in the database has been enriched by a set of diverse 3D structure conformations generated by the programs CORINA and ROTATE. The web-based structure and reaction retrieval system C@ROL provides a wide range of search methods to mine this rich database. The database is accessible at http://www2.chemie.uni-erlangen.de/services/biopath/index.html and http://www.mol-net.de/databases/biopath.html .
The incompleteness of genome-scale metabolic models is a major bottleneck for systems biology approaches, which are based on large numbers of metabolites as identified and quantified by metabolomics. Many of the revealed secondary metabolites and/or their derivatives, such as flavor compounds, are non-essential in metabolism, and many of their synthesis pathways are unknown. In this study, we describe a novel approach, Reverse Pathway Engineering (RPE), which combines chemoinformatics and bioinformatics analyses, to predict the “missing links” between compounds of interest and their possible metabolic precursors by providing plausible chemical and/or enzymatic reactions. We demonstrate the added-value of the approach by using flavor-forming pathways in lactic acid bacteria (LAB) as an example. Established metabolic routes leading to the formation of flavor compounds from leucine were successfully replicated. Novel reactions involved in flavor formation, i.e. the conversion of alpha-hydroxy-isocaproate to 3-methylbutanoic acid and the synthesis of dimethyl sulfide, as well as the involved enzymes were successfully predicted. These new insights into the flavor-formation mechanisms in LAB can have a significant impact on improving the control of aroma formation in fermented food products. Since the input reaction databases and compounds are highly flexible, the RPE approach can be easily extended to a broad spectrum of applications, amongst others health/disease biomarker discovery as well as synthetic biology.
Organic reactions occur as a result of complicated interactions among many factors: structural and electronic features of reactants, reagents, catalysts, temperature, etc. In this study, organic reactions were automatically classified based on these factors. A dataset of 131 reactions was investigated focusing on the changes of electronic features on the oxygen atoms at the reaction sites by principal component analysis and selforganizing neural networks analyses. Good correlations were found between the similarities in the changes of the electronic features of oxygen atoms of the reaction sites and the similarities in the substructural transformations at the reaction sites as well as with the known reaction types. These results demonstrate that a classification based on changes of electronic features is closely related to the classifications which chemists have been establishing from various points of view. Furthermore, this indicates the possibility for the automatic and systematic classification of a large number of organic reactions.
Highlights The historical development of the public COSMOS database is provided. COSMOS NG is a knowledge hub to share toxicity data and in silico tools. COSMOS NG has broad chemical coverage, with a focus on cosmetics. Chemical and toxicological data are quality assured through inclusion criteria. In silico TTC, profiling and read-across workflows are illustrated.
Drug-induced liver injury (DILI) remains a challenge when translating knowledge from the preclinical stage to human use cases. Attempts to model human DILI directly based on the information from drug labels have had some success; however, the approach falls short of providing insights or addressing uncertainty due to the difficulty of decoupling the idiosyncratic nature of human DILI outcomes. Our approach in this comparative analysis is to leverage existing preclinical and clinical data as well as information on metabolism to better translate mammalian to human DILI. The human DILI knowledge base from the United States Food and Drug Administration (U.S. FDA) National Center for Toxicology Research contains 1036 pharmaceuticals from diverse therapeutic categories. A human DILI training set of 305 oral marketed drugs was prepared and a binary classification scheme applied. The second knowledge base consists of mammalian repeated dose toxicity with liver toxicity data from various regulatory sources. Within this knowledge base, we identified 278 pharmaceuticals containing 198 marketed or withdrawn oral drugs with data from the U.S. FDA new drug application and 98 active pharmaceutical ingredients from ToxCast. From this collection, a set of 225 oral drugs was prepared as the mammalian hepatotoxicity training set with particular end points of pathology findings in the liver and bile duct. Both human and mammalian data sets were processed using various learning algorithms, including artificial intelligence approaches. The external validations for both models were comparable to the training statistics. These data sets were also used to extract species-differentiating chemotypes that differentiate DILI effects on humans from mammals. A systematic workflow was devised to predict human DILI and provide mechanistic insights. For a given query molecule, both human and mammalian models are run. If the predictions are discordant, both metabolites and parents are investigated for quantitative structure–activity relationship and species-differentiating chemotypes. Their results are combined using the Dempster–Shafer decision theory to yield a final outcome prediction for human DILI with estimated uncertainty. Finally, these tools are implementable within an in silico platform for systematic evaluation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.