Solute descriptors have been widely used to model chemical transfer processes through poly-parameter linear free energy relationships (pp-LFERs); however, there are still substantial difficulties in obtaining these descriptors accurately and quickly for new organic chemicals. In this research, models (PaDEL-DNN) that require only SMILES of chemicals were built to satisfactorily estimate pp-LFER descriptors using deep neural networks (DNN) and the PaDEL chemical representation. The PaDEL-DNN-estimated pp-LFER descriptors demonstrated good performance in modeling storage-lipid/water partitioning coefficient (log K storage-lipid/water ), bioconcentration factor (BCF), aqueous solubility (ESOL), and hydration free energy (freesolve). Then, assuming that the accuracy in the estimated values of widely available properties, e.g., logP (octanol−water partition coefficient), can calibrate estimates for less available but related properties, we proposed logP as a surrogate metric for evaluating the overall accuracy of the estimated pp-LFER descriptors. When using the pp-LFER descriptors to model log K storage-lipid/water , BCF, ESOL, and freesolve, we achieved around 0.1 log unit lower errors for chemicals whose estimated pp-LFER descriptors were deemed "accurate" by the surrogate metric. The interpretation of the PaDEL-DNN models revealed that, for a given test chemical, having several (around 5) "similar" chemicals in the training data set was crucial for accurate estimation while the remaining less similar training chemicals provided reasonable baseline estimates. Lastly, pp-LFER descriptors for over 2800 persistent, bioaccumulative, and toxic chemicals were reasonably estimated by combining PaDEL-DNN with the surrogate metric. Overall, the PaDEL-DNN/surrogate metric and newly estimated descriptors will greatly benefit chemical transfer modeling.
Environmental chemical reactions have been frequently investigated for various purposes; however, it remains challenging to accurately model either the reaction kinetics or reaction pathways. Existing studies mostly model reaction kinetics with traditional quantitative structure−activity relationships (QSARs) or reaction pathways with reaction template methods; however, these approaches generally require extensive feature engineering or manual extraction of reaction templates. Recently, machine learning (ML) has become a promising tool for modeling chemical reactions as ML models can perform well and are powerful in using diverse chemical representations. This Review starts with a concise comparison of traditional and ML modeling approaches for chemical reactions, followed by a brief discussion of the status of and future needs in modeling environmental organic reactions. Data collection and data cleaning techniques for reaction kinetics and pathways are then discussed. We then summarize the advantages and limitations of commonly used chemical representations and feature selection techniques. Next, we critically review general ML model evaluation and interpretation processes and propose a three-step evaluation process, that is, comparisons with general metrics, baseline models, and existing models. Lastly, we explore ML modeling approaches for small data sets, including transfer learning and active learning, which have been successfully employed in many other fields, for future modeling of environmental chemical reactions.
Iron-associated reductants play a crucial role in providing electrons for various reductive transformations. However, developing reliable predictive tools for estimating abiotic reduction rate constants (logk) in such systems has been impeded by the intricate nature of these systems. Our recent study developed a machine learning (ML) model based on 60 organic compounds toward one soluble Fe(II)-reductant. In this study, we built a comprehensive kinetic data set covering the reactivity of 117 organic and 10 inorganic compounds toward four major types of Fe(II)-associated reductants. Separate ML models were developed for organic and inorganic compounds, and the feature importance analysis demonstrated the significance of resonance structures, reducible functional groups, reductant descriptors, and pH in logk prediction. Mechanistic interpretation validated that the models accurately learned the impact of various factors such as aromatic substituents, complexation, bond dissociation energy, reduction potential, LUMO energy, and dominant reductant species. Finally, we found that 38% of the 850,000 compounds in the Distributed Structure-Searchable Toxicity (DSSTox) database contain at least one reducible functional group, and the logk of 285,184 compounds could be reasonably predicted using our model. Overall, the study is a significant step toward reliable predictive tools for anticipating abiotic reduction rate constants in iron-associated reductant systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.